This is the story of my latest get rich quick scheme - OneTrickFan. This is part tragedy, part comedy as I recount the harrowing tale of dreams of grandeur, letters from my ISP, an overlooked API, and ultimately, defeat at the hands of the coronavirus. So strap in, gentle reader, for an epic tale of how so many ML projects go - up, then sideways, and then down.
Gitlab Repository: https://gitlab.com/amunchet/one-trick-fan/
Introduction
The story begins with dreams of grandeur. While contemplating the League of Legends Pro Circuit, I came to realize that I wasn't enthralled by any of the personalities or teams - I was more interested in the champions being played. The reason for that? I'm a One Trick Viktor player. And I'm sure there are many other one-trick players who feel the same - if their champion isn't being played, why bother watching?
The problem is that there isn't a good way to be alerted when your champion is being played (we'll get to that in a bit). My proposed solution? A daemon that watches the various professional leagues, then posts to a Discord channel when the champion came on screen. The League of Legends interface is pretty standard, and the champion portraits stay relatively the same throughout the years. It's a foolproof plan - maybe I can grow the community enough to generate ad revenue, sponsors. Dreams of dotcom riches danced in my head.
First Attempt: OpenCV
To start, I needed to be able to recognize when a particular champion was being shown on screen. This proved more difficult than first realized.
The first method I tried was pattern matching (also called template matching) via OpenCV.
Image courtesy of OpenCV Documentation
I downloaded all 160 odd champion icons and was all set. It became quickly apparent that simple point sampling wasn't going to cut it. On the LCS or any pro stream, the champion icons often had random text in front of it, different summoner spells, were in different places on the screen, etc. This was a job for a neural network.
Phase One: A Helpful, Simple Classifier
So the first step in any ML process is data. More data. More data. And then some more data.
We wanted to train our learner on what each champion looked like. To do that, I turned to leaguefographs.com, who very kindly had sorted out different pro games by champion played. From there, I used some BeautifulSoup
to scrape the youtube links and youtube-dl
to do the actual downloading.
Here's where the fun starts.
Once downloaded, I had to turn them into images to be suitable for training. Seems easy enough, but this is actually quite a bit of data - 148 champions, many pro games, 24 frames per second....adds up fast. If you'll recall the homelab post, I'm tending to run my homelab in dockers. I first tried to do the conversions with just the CPU - it did okay, and ffmpeg
did use as many cores as were available, but I still wasn't happy.
I ended up using a docker image of ffmpeg
, run under nvidia
runtime with proper CUDA acceleration. Oh boy, that made a difference - increased conversion from videos to images by about 20x. It was a pain to figure out, but incredibly worthwhile.
So now I had a solid source for the data (and a letter from my ISP complaining about my bandwidth usage). What now?
Pre Processing
Neural networks, while very neat, are pretty resource hungry. As I don't have the resources of Google, Amazon, or well...someone who spends money on silly projects, I needed to cut down on the size of my image. I realized there was a lot of "noise" in the center of the screen - where the action was going on. For this problem, I didn't care who won the teamfight - I need to see what's on the sides.
Starting Image
First stage pre-processed
So that's what I did. I basically removed the middle 80% of the image, then cut out a square (makes the pre-trained networks happier that way). This worked pretty well - there were only a few instances where pro matches had weird layouts or strange aspect ratios (Korea tended to be 4:3 instead of the normal resolution used by everyone else). Overall, any time you can give your graphics card less work to do, the better - in my case, the middle of the screen wasn't helping my overall goal.
I also made everything black and white. This...maybe was a good idea, maybe wasn't. I think in retrospect, using the different colour layers would have probably helped training, but may have made a more fragile model. I'm not sure.
A Helpful Classifier
So where are we? We have labelled data, courtesy of league of graphs, in the form of VODs that featured a particular champion. We've then turned these VODs into wonderful pieces of training data - images. We've cut out the middle and made them black and white.
But we have a pretty serious problem.
In the VODs, the champion photos that we are trying to train our classifier to recognize aren't always present - things like player interviews, replays, etc. In fact, this happens about 30% of the time. That's 30% noise that we're adding into our data - BAD. So what do we do?
Another neural network, of course! This one being a very simple classifier: does the image show the champions or not (examples where it would not are pauses, interviews, victory/defeat screens, etc). This ended up working brilliantly - it took very little time to train (I think about a couple hours hacking on it to get a usable model) and ended up sorting all the images very well (I think I found maybe one or two mistakes in the entire set of champions at 10 VODs a piece - that's a level of noise I can live with).
Phase Two: Training for Champions
I decided instead of doing a multi-class classifier, I would train many models to each identify one champion. This was a good idea, but had a hidden caveat that I run into later.
Armed with our large amount of data (I think it ended up near 1TB), we were ready to train! I automated the process, and it took about a week - it was nice since I just left it to do its thing.
Training of the champions went well (you can see the Jupyter notebook in the repo).
Phase Three: Production....Oh Production
So what was the problem? The main problem boils down to, when I trained the single champion model, I had not seen every other champion. That is to say, I trained to identify a particular champion, but I didn't train the model to NOT identify ALL other champions. Basically, I ended up with a bunch of false positives. I encountered the "Big Sadness" before solving this problem, but the way to solve it would have been to ensure that the negative label included all other champions.
Testing the model
Fortunately, there's a twitch stream called TrackingThePros which uses a proper spectator client, like real competition League does. This was a great place to test out on live data - except when the champion you were trying to test wasn't being played by any pro player around the world. This brought up a good number of problems and would have been invaluable to getting a production model, but alas....
The Big Sadness
The Big Sadness. This is why you do your research before diving in. I was so excited to find a viable, non-trivial ML problem, that I didn't bother (really didn't want) to see if a different solution existed. And it does.
The LolEsports site has, for each pro match, a dashboard that shows the champions. To properly make OneTrickFan, I should scrape that page and then post the results to the Discord Channels. Or, there is apparently an undocumented API for the Esports side of things that I could interface with.
Either way, no ML was really required. Tragic!
Final Thoughts
One of the things that I've noticed on ML projects that are larger than toys is the need to mix normal Python development style and Jupyter notebooks. Just doing one or the other doesn't seem great.
The Simple model did exactly what it was supposed to do. Probably the most useful Neural Network I've ever made!
Do check on the constraints of your problem before getting all excited.
ML takes a lot of bandwidth. I got a cross letter from my ISP.
Using Graphics Card really makes data processing lots more fun - ffmpeg case and point.
Deployment Plans
To briefly discuss how I would have released OneTrickFan (and might still do). I would make a Discord Bot and a server with rooms for each champion. Users would message the bot to sign up for notifications. OneTrickFan backend would monitor lolesports to see any live competitions, parse them, then send notifications to the rooms for which champions were being played.
Conclusion
Ultimately, as you, gentle reader know, coronavirus came and cancelled all the live events, LoL LCS included. With the format changing to online only, I really lost interest in the project - most of the fun is seeing the competition live, even if you aren't physically present to witness it. While LCS resumed in an online only format, I just don't have the motivation to continue the project. What was going to be a great ML victory instead is now a scraping/undocumented API exercise with Discord integration - really lost most of its luster there. Oh well, I'm sure another GRQ scheme will wander across my way soon enough!
Disclaimers
Disclaimer: I do not own any of the intellectual property associated with League of Legends or Riot Games. It is presented here only for clarification reasons and does not imply any relationship between myself and the trademark/copyright holders. All images are used under Fair Use for identification and critique. I am not a lawyer.