Blackpink Data’s documentation!
Indices and tables
How to Build
Set up your machine
Python
Make sure you have installed:
Python 3.8
pip
Please note that the spotify.py module, which is based on the
library spotipy, seems to not work well with Windows, so I suggest to
use Linux or WSL on Windows. All the following commands assume that you
are in a Linux-like environment.
Clone the repository
Run: git clone https://github.com/marco97pa/Blackpink-Data.git
For more info see this guide
Then cd to the new directory
Install dependencies
Run pip3 install -r requirements.txt to install all the required
libraries
Set API keys as environment variables
The project is componed by different modules such as instagram.py,
youtube.py and more. Each module is used to get data from a
different source. To get this data you need the corresponding API keys.
Twitter API keys
export TWITTER_CONSUMER_KEY='xxxx'
export TWITTER_CONSUMER_SECRET='xxxx'
export TWITTER_ACCESS_KEY='xxxx'
export TWITTER_ACCESS_SECRET='xxxx'
YouTube API key
export YOUTUBE_API_KEY='xxxx'
Spotify API key
Go to Spotify Developer Dashboard, create a new app and get the API keys. Then set them as environment variables, by running these lines:
export SPOTIPY_CLIENT_ID='xxxx'
export SPOTIPY_CLIENT_SECRET='xxxx'
Instagram USERNAME and PASSWORD
You can set your username and password like this:
export INSTAGRAM_ACCOUNT_USERNAME='xxxxxx'
export INSTAGRAM_ACCOUNT_PASSWORD='xxxxxx'
Fork
By editing the data.yaml file you can make the script work with a different artist group.
For example, you could make a BTS Data Bot by editing the provided sample_data.yaml file and saving it as data.yaml
Edit the data.yaml accordingly with all the data you know. Leave empty fields or write fake data if you don’t know some details: they will be overwritten with the real ones at the first launch of the script.
With minimal or no code edits, the script could work even for single artists and not only groups.
Run
First run
data.yaml file in the same directory
as the script, run:python3 main.py -no-tweetFor the first run it is important that you use the -no-tweet option
to prevent an overload of tweets in your timeline. You should also check
that everything is fine by looking at the command line output and the
data.yaml file
Standard run
python3 main.pyParameters
By passing one or more parameters, you can disable a single module source. Actual parameters allowed are:
-no-instagram: disables Instagram source-no-youtube: disables YouTube source-no-spotify: disables Spotify source-no-birthday: disables birthdays events source-no-twitter: disables Twitter source (used for reposting)
-no-twitter is different from -no-tweet:-no-tweet actually prevents the bot from tweeting any update from
the enabled sources. The output will still be visible on the console.
This is really useful for testing.Schedule the bot
If you want the bot to run 24/7, you should set the script to run (for example) every 5 minutes to check for updates. Look at How to schedule tasks on Linux using crontab to get an idea on how to do it.
Modules
Main script
- main.check_args()
Checks the arguments passed by the command line
By passing one or more parameters, you can disable a single module source.
Actual parameters allowed are:
-no-instagram: disables Instagram source
-no-youtube: disables YouTube source
-no-spotify: disables Spotify source
-no-birthday: disables birthdays events source
-no-twitter: disables Twitter source (used for reposting)
Remember that -no-twitter is different than -no-tweet:
-no-tweet actually prevents the bot from tweeting any update from the enabled sources. The output will still be visible on the console. This is really useful for testing.
- Returns:
A dictionary that contains all the sources and their state (enabled or disabled, True or False)
- main.load_group()
Reads the data.yaml YAML file
Data about a group is stored inside the data.yaml file in the same directory as the script
- Returns:
A dictionary that contains all the informations about the group
- main.write_group(group)
Writes the data.yaml YAML file
Data about a group is stored inside the data.yaml file in the same directory as the script
- Args:
group: dictionary that contains all the informations about the group
Tweet
- tweet.check_duplicates(message)
Checks tweet message against 3 latest user tweets to ensure no duplicative posts
- Args:
message: a string containing the message to be posted
- Returns:
Boolean which signals True if a duplicate is found
- tweet.edit_image(filename, text, text_size=200, crop=False)
Edit an image by adding a text (uses the Pillow module)
- Args:
filename: filename of the image to be modified
text: text to be added
text_size (optional): size of the text (default: 200)
crop (optional): if enabled removes black bars from a video thumbnail (16:9 over 4:3)
- tweet.remove_URLs(text)
Remove URLs from a text string
- Args:
text: any text containing URL(s)
- Returns:
the same text without URL(s)
- tweet.retrieve_own_tweets(num=3)
Retrieves recent tweets made by the bot.
- Args:
num: an integer with the number of tweets to retrieve.
- Returns:
a list of tweet objects
- tweet.set_test_mode()
Enables the test mode
Prevents tweets from being posted. They are still printed in the console. This is really useful for debugging purposes
- tweet.twitter_post(message)
Post a message on Twitter (uses the Tweepy module)
- Args:
message: a string containing the message to be posted
- tweet.twitter_post_image(message, filename, text, text_size=200, crop=False)
Post a photo with message on Twitter (uses the Tweepy module)
- Args:
message: a string containing the message to be posted
filename: filename of the image to be posted
- tweet.twitter_repost(artist)
Retweets latest tweets of a given account
- Args:
artist: a dictionary with all the details of the artist
- Returns:
an dictionary containing all the updated data of the artist
Utils
- utils.convert_num(mode, num)
Converts a number in any given number scale
Example: convert_num(“100K”, 600000) returns 6
- Args:
mode: (string) the scale for the conversion (“100K”, “M”, “10M”, “100M”, “B”)
num: the number to be converted
- Returns:
the converted number
- utils.display_num(num, short=False, decimal=False)
Converts a number in a readable format
- Args:
num: the number to be converted
short (optional): flag to get a long or short literal (“Mln” vs “million”)
decimal (optional): flag to print also the first decimal digit (19.1 vs 19)
- Returns:
a string with a number in a readable format
- utils.download(url, filename)
Downloads a file, given an url and filename
- Args:
url: source from where download the image filename: name of the file to save
- utils.download_image(url)
Downloads an image, given an url
The image is saved in the download.jpg file
- Args:
url: source from where download the image
Birthdays
- birthdays.check_birthdays(group)
Checks if today is the birthday of a member of the group
It tweets if it is the birthday of someone
- Args:
group: a dictionary with all the details of the group
- Returns:
an dictionary containing all the updated data of the group
YouTube
- youtube.youtube_check_channel_change(old_channel, new_channel, hashtags)
Checks if there is any change in the number of subscribers or total views of the channel
It compares the old channel data with the new (already fetched) data.
- Args:
old_channel: dictionary that contains all the old data of the channel
new_channel: dictionary that contains all the updated data of the channel
hashtags: hashtags to add to the Tweet
- Returns:
a dictionary with updated data of the channel
- youtube.youtube_check_videos_change(name, old_videos, new_videos, hashtags)
Checks if there is any new video
It compares the old videos list of the artist with the new (already fetched) videos list. It tweets if there is a new release or if a video reaches a new views goal.
- Args:
name: name of the channel
old_videos: list that contains all the old videos
new_videos: list that contains all the updated videos
hashtags: hashtags to append to the Tweet
- Returns:
new_videos
- youtube.youtube_data(group)
Runs all the YouTube related tasks
It scrapes data from YouTube for the whole group and the single artists
- Args:
group: dictionary with the data of the group to scrape
- Returns:
the same group dictionary with updated data
- youtube.youtube_get_channel(api, channel_id)
Gets details about a channel
- Args:
api: The YouTube instance
channel_id: the ID of that channel on YouTube
- Returns:
an dictionary containing all the scraped data of that channel
- youtube.youtube_get_videos(api, playlist_id, name)
Gets videos from a playlist
- Args:
api: The YouTube instance
playlist_id: the ID of the playlist on YouTube
name: name of the channel owner of the playlist
- Returns:
a list of videos
Instagram
- instagram.clean_caption(caption)
Removes unnecessary parts of an Instagram post caption
It removes all the hashtags and converts tags in plain text (@marco97pa –> marco97pa)
- Args:
caption: a text
- Returns:
the same caption without hashtags and tags
- instagram.download_profile_pic(url)
Downloads an image, given an url
The image is saved in the download.jpg file
- Args:
url: source from where download the image
- instagram.instagram_data(group)
Runs all the Instagram related tasks
It scrapes data from Instagram for the whole group and the single artists
- Args:
group: dictionary with the data of the group to scrape
- Returns:
the same group dictionary with updated data
- instagram.instagram_last_post(artist, user_id)
Gets the last post of a profile
It tweets if there is a new post: if the timestamp of the latest stored post does not match with the latest fetched posts timestamp
- Args:
user_id: a profile ID
artist: a dictionary with all the details of the artist
- Returns:
an dictionary containing all the updated data of the artist
- instagram.instagram_profile(artist)
Gets the details of an artist on Instagram
It tweets if the artist reaches a new followers goal
- Args:
artist: a dictionary with all the details of the artist
- Returns:
an dictionary containing all the updated data of the artist
a Profile ID
Spotify
- spotify.check_new_songs(artist, collection, hashtags)
Checks if there is any new song
It compares the old discography of the artist with the new (already fetched) discography. It tweets if there is a new release or featuring of the artist.
- Args:
artist: dictionary that contains all the data about the single artist
collection: dictionary that contains all the updated discography of the artist
hashtags: hashtags to append to the Tweet
- Returns:
an artist dictionary with updated discography details
- spotify.get_artist(spotify, artist, hashtags)
Gets details about an artist
It tweets if the artist reaches a new goal of followers on Spotify
- Args:
spotify: The Spotify instance
artist: dictionary that contains all the data about the single artist
hashtags: hashtags to append to the Tweet
- Returns:
an artist dictionary with updated profile details
- spotify.get_discography(spotify, artist)
Gets all the releases of an artist
A release is single, EP, mini-album or album: Spotify simply calls them all “albums”
Example:
DDU-DU-DDU-DU of BLACKPINK is a single
SQUARE UP of BLACKPINK is a mini-album
THE ALBUM of BLACKPINK is (really) an album
It also gets releases where the artist is featured. Example:
Sour Candy is a song of Lady Gaga, but BLACKPINK are featured
Spotify also makes many “clones” of the same album: there could be extended albums or albums that later added tracks. Each one of this makes a duplicate of the same album. So this function also tries to clean up the discography by removing duplicates.
- Args:
spotify: The Spotify instance
artist: dictionary that contains all the data about the single artist
- Returns:
an dictionary with updated discography details
- spotify.link_album(album_id)
Generates a link to an album
- Args:
album_id: ID of the album
- Returns:
The link to that album on Spotify
- spotify.link_artist(artist_id)
Generates a link to an artist
- Args:
artist_id: ID of the artist
- Returns:
The link to that artist on Spotify
- spotify.login()
Logs in to Spotify
Client credential authorization flow The following API keys are needed to be set as environment variables:
SPOTIPY_CLIENT_ID
SPOTIPY_CLIENT_SECRET
You can request API keys on the Spotify Developer Dashboard
See https://spotipy.readthedocs.io/en/2.16.1/#authorization-code-flow for more details
- spotify.spotify_data(group)
Runs all the Spotify related tasks
It scrapes data from Spotify for the whole group and the single artists
- Args:
group: dictionary with the data of the group to scrape
- Returns:
the same group dictionary with updated data
Billboard Charts
- billboard_charts.billboard_data(group)
Gets Billboard charts of a group
- It starts all the tasks needed to get latest data and eventually tweet updates
Data is updated once a day
- Args:
group: dictionary that contains all the data about the group
- Returns:
the same group dictionary with updated data
- billboard_charts.get_artist_rank(artist, chart)
Gets the Billboard Hot 100 chart and tries to find an artist
- Args:
artist: the artist to look for
- Returns:
a string containing the list of songs found in the chart (it can be empty)