Constructing the Twitch Bot

I want to give the IRC bot a post of its own as not to distract too much from the machine learning post. This bot will be used to log IRC chat messages and poll for viewer counts on Twitch. There are a ton of different ways to implement this so feel free to take what you see here and build a bot of your own. Here I’ll be using Python to construct the bot. All our data will be stored in a SQLite database, and we will spin off three threads for each Twitch streamer we plan to track. One thread will keep track of whether the streamer is online or not, another thread will log all the IRC chat activity, and the finial thread will log the viewer count. Using SQLite is great because later we’ll be able to easily access our data via Pandas.

So first let’s define the main loop that will kick everything off:

import sys;
import time;
import socket;
import sqlite3;
import requests;
from threading import Thread, Lock;
from time import gmtime, strftime, sleep;

if __name__ == "__main__":

	password  = "oauth:YOUR_CODE_HERE";
	nickname  = "YOUR_USERNAME";
	client_id = "YOUR_CLIENT_ID";

	study_channels = ["drdisrespectlive", "lirik", "manvsgame", "cohhcarnage"]

	TwitchBot.db_con = sqlite3.connect("twitch.db", check_same_thread=False);
	TwitchBot.db_con.text_factory = str;

	bots = []
	for i in range(0, len(study_channels)):
		bots.append(TwitchBot(study_channels[i], nickname, password, client_id))

	running = True;

	while running:
		command = raw_input("");
		if command == "exit":
			running = False;
			for i in range(0, len(study_channels)):
				bots[i].running = False

	TwitchBot.db_con.close();

Let’s break this down a little:

  1. if __name__ == “__main__”: checks to make sure we are running the script directly (as opposed to being imported via someone else).
  2. Next we define a few variables which we will need in order to connect to the IRC chat room and poll for viewer counts. password is our OAuth password we will use when connecting to the chat rooms. If you don’t already have one grab one from herenickname is the username that you use when logging into Twitch (therefore you’ll need to sign up for Twitch in order to log into the IRC rooms). And finally we have client_id which is our ID which Twitch gives us when we register our application (see here). Once you have this information filled out you’ll be good to go!
  3. Now comes the fun part, which streamers are you going to follow? study_channels is a list of the channels we are going to log. When picking steamers you might want to consider when they stream (i.e. if you want the bot running 24/7 find 2/3 night steamers and 2/3 daytime streamers) and how many threads you want to use for the bot (3 threads per streamer plus one main thread).
  4. TwitchBot.db_con is a static variable for the SQLite database connection. (We will write the TwitchBot class itself soon!) TwitchBot.db_con.text_factory = str just ensures that every time we attempt to save a string to the SQLite database is in the standard string format we are expecting (some emotes are sent in non-ascii format. stuff like “¯_(ツ)_/¯”, which we will ignore). I should also point out that check_same_thread=False prevents SQLite from throwing errors when accessing a single database via multiple threads. In any other scenario this would be a problem but we will be using our own mutex to ensure that only one thread writes to the database at a time.
  5. Next we create an instance of the TwitchBot class for each streamer we plan to follow and then we “listen” for when the user types “exit” and then proceed to shut down all the bots we created and close the database connection.

Next let’s define the constructor for our TwitchBot class. Here I will check to see whether our SQLite database already has the tables we need and create them if they don’t already exist. We will also spin off the three threads we’ll need for each channel we plan to log information for.

class TwitchBot:

	mutex  = Lock();
	db_con = None;

	def __init__(self, channel, nickname, password, client_id):

		self.net_data       = None;
		self.net_sock       = socket.socket();
		self.channel        = channel;
		self.nickname       = nickname;
		self.password       = password;
		self.client_id      = client_id;
		self.running        = True;
		self.is_online      = False;
		self.cur_stream_ind = None;

		TwitchBot.mutex.acquire();

		self.cur = TwitchBot.db_con.cursor();
		self.cur.execute("CREATE TABLE IF NOT EXISTS messages (id integer PRIMARY KEY AUTOINCREMENT, username text NOT NULL, message text NOT NULL, channel text NOT NULL, datetime_recv datetime NOT NULL);")
		self.cur.execute("CREATE TABLE IF NOT EXISTS viewers (id integer PRIMARY KEY AUTOINCREMENT, num_viewers integer NOT NULL, channel text NOT NULL, datetime_recv datetime NOT NULL);")
		self.cur.execute("CREATE TABLE IF NOT EXISTS streams (id integer PRIMARY KEY AUTOINCREMENT, start_time datetime NOT NULL, end_time datetime, channel text NOT NULL);")
		TwitchBot.db_con.commit();

		TwitchBot.mutex.release();

		thread_func = self.check_online_thread;
		thread = Thread(target=thread_func, args=(self,));
		thread.start();

		sleep(1)

		thread_func = self.irc_thread;
		thread = Thread(target=thread_func, args=(self,));
		thread.start();

		sleep(1)

		thread_func = self.viewer_thread;
		thread = Thread(target=thread_func, args=(self,));
		thread.start();

		sleep(1)

The first thing we do here is acquire the mutex and check whether three tables exist within our SQLite database (by the way, you can hover your mouse over the code above and use the slider to see the rest of the SQL code). The first table we need is called “messages”:

In this table we have the IRC username for who said the chat message, the message itself, the channel in which the message was said, and the date + time for when we received the message. We also need a table for our viewer counts, let’s call this table “viewers”:

In this table we have the number of viewer, the channel, and the date + time for which we polled for the viewer count. Next we need a table that will allow us to differentiate the different streams given by the same streamer (this also allows us to stop polling the viewer count when they go offline). Let’s call this table “streams”:

Here we have the start and end time for each stream and the channel that went live. Setting things up this way will allow us to easily grab all the messages and viewer counts for each of the individual streams. If you would like to view your SQLite database similar to the images above you should take a look at DB Browser for SQLite. Next we spin off our three threads, the first of which checks to see whether the streamer is online:

def check_online_thread(self, data):
	while self.running:
		try:
			resp = requests.get(
				"https://api.twitch.tv/helix/streams?user_login="+self.channel,
				headers={"Client-ID": self.client_id}
			);

			resp = resp.json();

			if "data" in resp:
				if not self.is_online:
					print("33[93mChecking to see if "+self.channel+" is online33[0m.");

				if not resp["data"]:
					if self.is_online:

						TwitchBot.mutex.acquire();
						self.cur.execute("UPDATE streams SET end_time = ? WHERE id = ?;",
							(strftime("%Y-%m-%d %H:%M:%S", gmtime()), self.cur_stream_ind)
						);
						TwitchBot.db_con.commit();
						TwitchBot.mutex.release();

						print("33[93m""+self.channel+"" is offline33[0m.");

						self.is_online = False;
					else:
						print("33[93m""+self.channel+"" is offline33[0m.");

				else:
					if not self.is_online:

						TwitchBot.mutex.acquire();
						self.cur.execute("INSERT INTO streams (start_time, end_time, channel) VALUES (?, NULL, ?);",
							(strftime("%Y-%m-%d %H:%M:%S", gmtime()), self.channel)
						);
						TwitchBot.db_con.commit();
						self.cur_stream_ind = self.cur.lastrowid;
						TwitchBot.mutex.release();

						print("33[93m""+self.channel+"" is online33[0m.");

						self.is_online = True;

			sleep(60);
		except:
			continue

Here we are simply sending a get request to Twitch using our application ID and Twitch sends us back a JSON string which we can decode and check whether there is an data available for our streamer. If the “data” vector in our get response is empty then the streamer is offline. If they were previously online but are now offline we update the end time in the database and inform all the other threads. On the other hand, if the stream was previously offline but is not online we insert a new row into the “streams” table and set the start time for the stream. The end time is set to NULL until the stream ends. This allows us to ignore streams for which we do not have complete data on as they are still in progress. Next let’s take a look at the thread which interfaces with the IRC chat room:

#--------------------------------------------------------------------------
def parse(self, line):
	prefix = "";
	trailing = [];
	if line[0] == ":":
		prefix, line = line[1:].split(" ", 1);
	if line.find(" :") != -1:
		line, trailing = line.split(" :", 1);
		args = line.split();
		args.append(trailing);
	else:
		args = line.split();
	command = args.pop(0);	
	return prefix, command, args

#--------------------------------------------------------------------------
def process(self, prefix, command, args):
	if command == "PING":
		self.net_sock.send(self.net_data.replace("PING", "PONG"));
	elif command == "376":
		self.net_sock.send("JOIN #"+self.channel+"rn");
	elif command == "PRIVMSG":
		if self.is_online:
			user_name = prefix.split("!")[0];
			user_message = args[1];
			print("33[91m"+user_name+"33[0m: "+args[1]);
			self.save_msg(user_name, user_message);

#--------------------------------------------------------------------------
def save_msg(self, user_name, user_message):
	TwitchBot.mutex.acquire();

	self.cur.execute("INSERT INTO messages (username, message, channel, datetime_recv) VALUES (?, ?, ?, ?);",
		(user_name, user_message, self.channel, strftime("%Y-%m-%d %H:%M:%S", gmtime()))
	);
	TwitchBot.db_con.commit();

	TwitchBot.mutex.release();

#--------------------------------------------------------------------------
def irc_thread(self, data):
	self.net_sock.connect(("irc.twitch.tv", 6667));

	self.net_sock.send("PASS "+self.password+"rn");
	self.net_sock.send("NICK "+self.nickname+"rn");

	while self.running:
		try:
			self.net_data = self.net_sock.recv(1024);
			if not self.net_data: break;

			lines = self.net_data.split("rn");
			lines.remove("");

			for line in lines:
				prefix, command, args = self.parse(line);
				self.process(prefix, command, args);
		except:
			continue;

	self.net_sock.close();

This thread is a bit more complicated but we first open a network socket to connect to “irc.twitch.tv” via port 6667. This is where all the IRC chat rooms are hosted. We send our password and nickname in order to log into the chat server. Then we continuously check for data coming from the Twitch IRC server. When we do receive data we pass it to self.parse(line) which grabs the prefix, command, and arguments for the response we got from the server. We only care to process a subset of all the possible responses we might receive from the IRC server and def process(self, prefix, command, args) handles this for us. Command “376” informs us that the login handshake has been finalized and we may now join an IRC channel. Command “PING” is to check whether we are still listening to the server. We have to reply with a “PONG” or the server will close our socket connection because it assumes we disconnected or timed out. The “PRIVMSG” command tells us we got a chat message and we should save it to the SQLite database. Saving occurs in the def save_msg(self, user_name, user_message) method and is relatively straight forward. Next we’ll need a thread for polling the viewer count. Polling the viewer count is very similar to checking whether the streamer is online:

#--------------------------------------------------------------------------
def grab_num_viewers(self):
	resp = requests.get(
		"https://api.twitch.tv/helix/streams?user_login="+self.channel,
		headers={"Client-ID": self.client_id}
	);

	try:
		resp = resp.json();
	except:
		return -1;

	if "data" in resp:
		if not resp["data"]:
			return -1;
		return resp["data"][0]["viewer_count"];
	else:
		return -1;

#--------------------------------------------------------------------------
def save_viewers(self, num_viewers):
	TwitchBot.mutex.acquire();

	self.cur.execute("INSERT INTO viewers (num_viewers, channel, datetime_recv) VALUES (?, ?, ?);",
		(num_viewers, self.channel, strftime("%Y-%m-%d %H:%M:%S", gmtime()))
	);
	TwitchBot.db_con.commit();
	
	TwitchBot.mutex.release();

#--------------------------------------------------------------------------
def viewer_thread(self, data):
	while self.running:
		try:
			if self.is_online:
				num_viewers = self.grab_num_viewers();
				if num_viewers != -1:
					print("33[94mNumber of viewers33[0m 33[93m(""+self.channel+"")33[0m: "+str(num_viewers));
					self.save_viewers(num_viewers);
				sleep(10);
			else:
				sleep(3);
		except:
			continue;

Here we again send a get request using the Twitch API, decode the JSON string, and check for the data list – however this time we are actually making use of the information within the data list!

Using the data collected by this bot we can construct plots like the one below!

Do you see that big spike between the 10:00 and 12:00 marks? Well if you check the VOD Bikeman (another streamer) actually raids ManVsGame 6 hours and 48 minutes into the stream and we were able to capture it using our bot!

You should make your own bot and give this a try! Let it run for a few days and collect plenty of data because I am planning to make a follow-up post were we will apply a statistical classifier to the IRC messages and see how well we can predict which stream they came from!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s