Speed with a side of danger

I just deployed a change that should allow Shmeppy to handle much more load.

It used to be that making changes to a very large map could take quite a long time to save to the server. You’d see “Still saving changes from X seconds ago…” appear in the network box fairly often.

But this should be a thing of the past. And most importantly, Shmeppy should feel all-around snappy even under much higher loads than I’ve seen so far.

What changed?

This is a fairly technical change (like many of my optimizations), but I’ll take a wack at explaining what I’ve done.

Shmeppy only stores operations in the database. So when you make a change to your map, and send an operation to the server, it needs to load up all of the existing operations, figure out the current state of the map, and then check to see whether the operation you provided is valid when applied to that state.

This is slow. But it has the advantage of being very safe, which is nice. However, Shmeppy is starting to see loads where I think it’s prudent to sacrifice a little bit of safety for speed.

So now Shmeppy keeps a cache of the latest map state for many games in-memory (basically the N most recent games users have accessed). It carefully synchronizes this in-memory cache with the database, and using this cache, I’m able to avoid requesting operations from the database in a huge number of common cases.

It’s less safe because if the cache does manage to get out of sync with the database, then it’s possible for an operation to be entered into the database that shouldn’t be. If this happens, the game will be permanently in a bad state, and no one will be able to load it, unless I go and manually figure out what’s wrong and fix it. Not fun.

Though don’t let me scare you too much. This is generally a very acceptable level of risk, and typically websites have many of these dangerously-used caches. The risk raising is just very scary to me in particular because I’m very risk averse and it’s just me monitoring Shmeppy. I don’t want problems to happen because it falls on me to fix them.

4 Likes

Actually, after further review of some new performance charts I made, there’s still some more cases to pin down here… I fixed the hardest one, and the one that was actually weighing the server down, but there’s still some other reasons Shmeppy would appear to take a long time to save changes. Like if one of your users has a slow connection, Shmeppy might end up waiting for them to receive some data before moving on to the next operation to save.