Blackpink Data’s documentation!

Indices and tables

How to Build

Set up your machine

Python

Make sure you have installed:

Python 3.8
pip

Please note that the spotify.py module, which is based on the library spotipy, seems to not work well with Windows, so I suggest to use Linux or WSL on Windows. All the following commands assume that you are in a Linux-like environment.

Clone the repository

Run: git clone https://github.com/marco97pa/Blackpink-Data.git

For more info see this guide

Then cd to the new directory

Install dependencies

Run pip3 install -r requirements.txt to install all the required libraries

Set API keys as environment variables

The project is componed by different modules such as instagram.py, youtube.py and more. Each module is used to get data from a different source. To get this data you need the corresponding API keys.

Twitter API keys

Go to the Twitter Developers page, log in, go to Dashboard and create a new app with read and write permissions.

Then copy the generated keys and set them as environment variables, by running these lines (change them with your actual key values):

export TWITTER_CONSUMER_KEY='xxxx' export TWITTER_CONSUMER_SECRET='xxxx' export TWITTER_ACCESS_KEY='xxxx' export TWITTER_ACCESS_SECRET='xxxx'

YouTube API key

Go to Google Developers and follow their istructions on how to get an API key for YouTube

Then copy the generated key and set it as environment variable, by running this line (change with your actual key value):

export YOUTUBE_API_KEY='xxxx'

Spotify API key

Go to Spotify Developer Dashboard, create a new app and get the API keys. Then set them as environment variables, by running these lines:

export SPOTIPY_CLIENT_ID='xxxx' export SPOTIPY_CLIENT_SECRET='xxxx'

Instagram USERNAME and PASSWORD

You can set your username and password like this: export INSTAGRAM_ACCOUNT_USERNAME='xxxxxx' export INSTAGRAM_ACCOUNT_PASSWORD='xxxxxx'

Fork

By editing the data.yaml file you can make the script work with a different artist group.

For example, you could make a BTS Data Bot by editing the provided sample_data.yaml file and saving it as data.yaml

Edit the data.yaml accordingly with all the data you know. Leave empty fields or write fake data if you don’t know some details: they will be overwritten with the real ones at the first launch of the script.

With minimal or no code edits, the script could work even for single artists and not only groups.

Run

First run

Assumed that you have a valid data.yaml file in the same directory as the script, run:

python3 main.py -no-tweet

For the first run it is important that you use the -no-tweet option to prevent an overload of tweets in your timeline. You should also check that everything is fine by looking at the command line output and the data.yaml file

Standard run

From the next time, you can just run: python3 main.py

It will tweet eventually changes on the dataset.

Parameters

By passing one or more parameters, you can disable a single module source. Actual parameters allowed are:

-no-instagram: disables Instagram source
-no-youtube: disables YouTube source
-no-spotify: disables Spotify source
-no-birthday: disables birthdays events source
-no-twitter: disables Twitter source (used for reposting)

Remember that -no-twitter is different from -no-tweet:

-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still be visible on the console. This is really useful for testing.

Schedule the bot

If you want the bot to run 24/7, you should set the script to run (for example) every 5 minutes to check for updates. Look at How to schedule tasks on Linux using crontab to get an idea on how to do it.

Modules

Main script

main.check_args()

Checks the arguments passed by the command line

By passing one or more parameters, you can disable a single module source.

Actual parameters allowed are:

-no-instagram: disables Instagram source
-no-youtube: disables YouTube source
-no-spotify: disables Spotify source
-no-birthday: disables birthdays events source
-no-twitter: disables Twitter source (used for reposting)

Remember that -no-twitter is different than -no-tweet:

-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still be visible on the console. This is really useful for testing.

Returns:: A dictionary that contains all the sources and their state (enabled or disabled, True or False)

main.load_group()

Reads the data.yaml YAML file

Data about a group is stored inside the data.yaml file in the same directory as the script

Returns:: A dictionary that contains all the informations about the group

main.write_group(group)

Writes the data.yaml YAML file

Data about a group is stored inside the data.yaml file in the same directory as the script

Args:: group: dictionary that contains all the informations about the group

Tweet

tweet.check_duplicates(message)

Checks tweet message against 3 latest user tweets to ensure no duplicative posts

Args:: message: a string containing the message to be posted
Returns:: Boolean which signals True if a duplicate is found

tweet.edit_image(filename, text, text_size=200, crop=False)

Edit an image by adding a text (uses the Pillow module)

Args:

filename: filename of the image to be modified
text: text to be added
text_size (optional): size of the text (default: 200)
crop (optional): if enabled removes black bars from a video thumbnail (16:9 over 4:3)

tweet.remove_URLs(text)

Remove URLs from a text string

Args:: text: any text containing URL(s)
Returns:: the same text without URL(s)

tweet.retrieve_own_tweets(num=3)

Retrieves recent tweets made by the bot.

Args:: num: an integer with the number of tweets to retrieve.
Returns:: a list of tweet objects

tweet.set_test_mode()

Enables the test mode

Prevents tweets from being posted. They are still printed in the console. This is really useful for debugging purposes

tweet.twitter_post(message)

Post a message on Twitter (uses the Tweepy module)

Args:: message: a string containing the message to be posted

tweet.twitter_post_image(message, filename, text, text_size=200, crop=False)

Post a photo with message on Twitter (uses the Tweepy module)

Args:

message: a string containing the message to be posted
filename: filename of the image to be posted

tweet.twitter_repost(artist)

Retweets latest tweets of a given account

Args:: artist: a dictionary with all the details of the artist
Returns:: an dictionary containing all the updated data of the artist

Utils

utils.convert_num(mode, num)

Converts a number in any given number scale

Example: convert_num(“100K”, 600000) returns 6

Args:

mode: (string) the scale for the conversion (“100K”, “M”, “10M”, “100M”, “B”)
num: the number to be converted

Returns:

the converted number

utils.display_num(num, short=False, decimal=False)

Converts a number in a readable format

Args:

num: the number to be converted
short (optional): flag to get a long or short literal (“Mln” vs “million”)
decimal (optional): flag to print also the first decimal digit (19.1 vs 19)

Returns:

a string with a number in a readable format

utils.download(url, filename)

Downloads a file, given an url and filename

Args:: url: source from where download the image filename: name of the file to save

utils.download_image(url)

Downloads an image, given an url

The image is saved in the download.jpg file

Args:: url: source from where download the image

Birthdays

birthdays.check_birthdays(group)

Checks if today is the birthday of a member of the group

It tweets if it is the birthday of someone

Args:: group: a dictionary with all the details of the group
Returns:: an dictionary containing all the updated data of the group

YouTube

youtube.youtube_check_channel_change(old_channel, new_channel, hashtags)

Checks if there is any change in the number of subscribers or total views of the channel

It compares the old channel data with the new (already fetched) data.

Args:

old_channel: dictionary that contains all the old data of the channel
new_channel: dictionary that contains all the updated data of the channel
hashtags: hashtags to add to the Tweet

Returns:

a dictionary with updated data of the channel

youtube.youtube_check_videos_change(name, old_videos, new_videos, hashtags)

Checks if there is any new video

It compares the old videos list of the artist with the new (already fetched) videos list. It tweets if there is a new release or if a video reaches a new views goal.

Args:

name: name of the channel
old_videos: list that contains all the old videos
new_videos: list that contains all the updated videos
hashtags: hashtags to append to the Tweet

Returns:

new_videos

youtube.youtube_data(group)

Runs all the YouTube related tasks

It scrapes data from YouTube for the whole group and the single artists

Args:: group: dictionary with the data of the group to scrape
Returns:: the same group dictionary with updated data

youtube.youtube_get_channel(api, channel_id)

Gets details about a channel

Args:

api: The YouTube instance
channel_id: the ID of that channel on YouTube

Returns:

an dictionary containing all the scraped data of that channel

youtube.youtube_get_videos(api, playlist_id, name)

Gets videos from a playlist

Args:

api: The YouTube instance
playlist_id: the ID of the playlist on YouTube
name: name of the channel owner of the playlist

Returns:

a list of videos

Instagram

instagram.clean_caption(caption)

Removes unnecessary parts of an Instagram post caption

It removes all the hashtags and converts tags in plain text (@marco97pa –> marco97pa)

Args:: caption: a text
Returns:: the same caption without hashtags and tags

instagram.download_profile_pic(url)

Downloads an image, given an url

The image is saved in the download.jpg file

Args:: url: source from where download the image

instagram.instagram_data(group)

Runs all the Instagram related tasks

It scrapes data from Instagram for the whole group and the single artists

Args:: group: dictionary with the data of the group to scrape
Returns:: the same group dictionary with updated data

instagram.instagram_last_post(artist, user_id)

Gets the last post of a profile

It tweets if there is a new post: if the timestamp of the latest stored post does not match with the latest fetched posts timestamp

Args:

user_id: a profile ID
artist: a dictionary with all the details of the artist

Returns:

an dictionary containing all the updated data of the artist

instagram.instagram_profile(artist)

Gets the details of an artist on Instagram

It tweets if the artist reaches a new followers goal

Args:

artist: a dictionary with all the details of the artist

Returns:

an dictionary containing all the updated data of the artist
a Profile ID

Spotify

spotify.check_new_songs(artist, collection, hashtags)

Checks if there is any new song

It compares the old discography of the artist with the new (already fetched) discography. It tweets if there is a new release or featuring of the artist.

Args:

artist: dictionary that contains all the data about the single artist
collection: dictionary that contains all the updated discography of the artist
hashtags: hashtags to append to the Tweet

Returns:

an artist dictionary with updated discography details

spotify.get_artist(spotify, artist, hashtags)

Gets details about an artist

It tweets if the artist reaches a new goal of followers on Spotify

Args:

spotify: The Spotify instance
artist: dictionary that contains all the data about the single artist
hashtags: hashtags to append to the Tweet

Returns:

an artist dictionary with updated profile details

spotify.get_discography(spotify, artist)

Gets all the releases of an artist

A release is single, EP, mini-album or album: Spotify simply calls them all “albums”

Example:

DDU-DU-DDU-DU of BLACKPINK is a single

SQUARE UP of BLACKPINK is a mini-album

THE ALBUM of BLACKPINK is (really) an album

It also gets releases where the artist is featured. Example:

Sour Candy is a song of Lady Gaga, but BLACKPINK are featured

Spotify also makes many “clones” of the same album: there could be extended albums or albums that later added tracks. Each one of this makes a duplicate of the same album. So this function also tries to clean up the discography by removing duplicates.

Args:

spotify: The Spotify instance
artist: dictionary that contains all the data about the single artist

Returns:

an dictionary with updated discography details

spotify.link_album(album_id)

Generates a link to an album

Args:: album_id: ID of the album
Returns:: The link to that album on Spotify

spotify.link_artist(artist_id)

Generates a link to an artist

Args:: artist_id: ID of the artist
Returns:: The link to that artist on Spotify

spotify.login()

Logs in to Spotify

Client credential authorization flow The following API keys are needed to be set as environment variables:

SPOTIPY_CLIENT_ID

SPOTIPY_CLIENT_SECRET

You can request API keys on the Spotify Developer Dashboard

See https://spotipy.readthedocs.io/en/2.16.1/#authorization-code-flow for more details

spotify.spotify_data(group)

Runs all the Spotify related tasks

It scrapes data from Spotify for the whole group and the single artists

Args:: group: dictionary with the data of the group to scrape
Returns:: the same group dictionary with updated data

Billboard Charts

billboard_charts.billboard_data(group)

Gets Billboard charts of a group

It starts all the tasks needed to get latest data and eventually tweet updates

Data is updated once a day

Args:

group: dictionary that contains all the data about the group

Returns:

the same group dictionary with updated data

billboard_charts.get_artist_rank(artist, chart)

Gets the Billboard Hot 100 chart and tries to find an artist

Args:

artist: the artist to look for

Returns:

a string containing the list of songs found in the chart (it can be empty)