Daniel Waterworth

Motivation

Do you think you can guess the review count of a random game on Steam?

A month ago, Jonas Tyroller shared a video about a Chrome extension that turns Steam into a game where you guess a title’s review count. He called it Review Guesser.

If you haven't tried it, I highly recommend it. It's an interesting challenge to try to predict a game's measure of success based only on its store page; and, if you're an aspiring game developer, a sober reminder of how great a game can look and still be a commercial failure.

When I played this game, though, I kept wondering, "how well could a computer do at this task?"

This is a dangerous kind of question for an engineer. It's the kind of question that ends in half-baked prototypes and hard-learned lessons.

This time was different, however. It always is.

The Plan

With that, I could begin. The plan was simple:

Scrape steam,
Train a neural network,
...
Profit?

Scraping steam is the easy part. I wrote a dirt simple library that makes HTTP requests, but which memoizes (caches) the calls in a sqlite database. That way, even if I change the scraping logic, I never have to make duplicate requests for the same resource. I just have to make sure that I don't change the way the HTTP requests/responses get stored.

With steam's rate limits, it took around a week to fetch all of the data and I did restart the script and tweak things many times, but thanks to the memoization, starting over didn't mean having to redo requests.

Training the neural network is a little bit trickier. What data should it be given and how should it be represented? What network architecture should I use? How can I tell if it's working? There are lots of unanswered questions at this point.

I decided early on that I wanted to just pass the short descriptions. Perhaps I would add in more data later, but, at least to begin with, the idea of having a model that would predict a game's success based only on its description was very enticing.

That way, I could just fabricate a game's short description and get a score. That sounds like a fun toy to play with.

Now, there are a lot of steam games: over 100,000, but that's a drop in the bucket if you are trying to learn an entire language. So, starting from scratch wouldn't be an option. However, fortunately for me, embedding models exist.

What Are Embedding Models?

Text embedding models take an arbitrary amount of text and produce a fixed length vector. The special property they have (or at least, try to have) is that, if you pass in semantically similar sentences, you'll get similar vectors. That makes these networks useful for searching, because, if you store the embeddings of a collection of documents, you can use simple distance checks to find similar documents to a target document.

I wouldn't be doing search, however. I was only interested in embedding models for their ability to understand English.

I planned to train a little neural network that, instead of taking a game's description, would take the embedding of a game's description and use just that to try to predict the number of reviews that it got.

This is where the fun started.

The Execution

Attempt #1

For my first attempt, I picked "all-MiniLM-L6-v2" for the embedding network and tried to predict the log of the review count. Why the log and not just the count directly? Well, the distribution of reviews is highly skewed. There are a few outliers with a huge number of reviews, but most games have practically no reviews. Log compresses the range, which should help. Log can also be undone afterwards, I can run the network output through the exponentiation function and get the raw review count.

I used MSE (mean-squared error) as my loss function and, unfortunately, the network learned that the best strategy was just to output something close to zero no matter what the input was. I would have to do something smarter.

Attempt #2

I decided at this point to go from trying to predict the score directly (that is, regression), to trying to predict the range that the score is in given a few options (that is, classification).

I put the games into percentile range buckets and tried to predict the bucket given the description embedding. This showed some promise. It wasn't putting all games in the same bucket. It had actually learned something about what makes a good game description.

I could also get the expected percentile value and have a sensible result. Then, probing the network with all manner of inputs, I started noticing patterns; things the network liked or didn't like in a description. Very early on, I noticed that the network liked direct language. For instance, it scored:

You are a goat.

Much higher than:

You play as a goat.

These two examples quickly became my smoke tests for new models. This model, however, had a fatal flaw, which I'll discuss next time.

To be continued...