Speeding up Kotlin console apps 10x with coroutines and fast AWS servers | Centerfield Nine | The House Sometimes Wins

This is part 2 of a 2-part series. In Part 1, we sped up a Python-based Jupyter notebook. That article also provides the motivation and the project background, evaluating the House Advanatage of a new blackjack game.

As we detailed in Part 1, our goal is to simulate playing our new blackjack game, Cheat At Blackjack®, hundreds of millions of times to accurately calculate the game's House Advantage. Upon each run, we're adjusting rules or payouts and player strategy, to find the ideal set of options that provides a House Advantage attractive to casino managers and a game that players will enjoy and find easy to learn.

Because building a game simulator is somewhat challenging, I built two models, one in Python and one in Kotlin, to check against each other and possibly find bugs or rule variations. Anytime the models didn't match up, I knew there was some issue that needed to be found and fixed. However I wasn't satisfied just checking my models against each other, I wanted my simulator to include a player playing traditional blackjack, so the results of each run could be calibrated against a well-studied game where the House Advantage had already been calculated by multiple trusted sources.

Kotlin is a compiled language while Python is not. This translates to Kotlin being significantly faster to execute. Kotlin has quite a few features, like strong typing and immutable variables, that make it a pleasure to code in, and which help to speed up execution as well as reduce errors. And because it executes on the Java Virtual Machine (JVM), Kotlin code can easily interact with the fabulous H2 Database, which also runs on the JVM. Recording results in a database is essential during the iteration process, as we will need to review individual hands to ensure the simulation code is properly executing the correct decisions for each player's hand, correctly determining the winner, and making the correct payouts. We also will need to review groups of similar hands in order to evaluate the optimal strategy for every combination of player cards and dealer cards.

As you may be realizing, the Kotlin model is much more robust than the Python notebook, as the Kotlin model can log hands to a database, can play multiple games (conventional blackjack and Cheat®), has more options for rule adjustments and player strategies, and runs a lot faster. You could do all of these things in Python as well, but as far as choosing one language for the "primary" model and one for a simpler model whose primary goal is just to verify the results of the first, Kotlin easily wins over Python.

As I stated, the Kotlin simulator was intended to have a built-in calibrator, a player who plays only traditional blackjack. This allows me to compare his House Advantage to known H/As, such as those published by the Wizard of Odds. When this player's H/A matches up, that lets me know that the basics of the simulator – the shuffling, the dealer's actions, the hand evaluations and payouts – are sound. Because of this, I chose to hard-code 3 players into this simulator. One plays traditional blackjack, one plays Cheat®, and one alternates between every hand. If speed was the only concern, I could build a model, like I did in Python, with 5 Cheat® players only, but given Kotlin's built-in speed advantage, having the calibration players provides confidence in the results while still executing very quickly.

Initial speed measurements

One of the nice things of running a non-GUI app is the ability to execute it right from within an IDE, which in this case, is IntelliJ. In our first test, we try running 40 loops of 100,000 games each, or 4 million Cheat® hands, on my local PC, which completes in 48 seconds. So right away, without any parallelism, we're capable of 84,000 hands per second, about 4x better than the best Python results on the same computer. But wait a second – actually, the simulator plays 3 hands per game, since there are 3 players. It's just that only one is exclusively playing the game we're evaluating. Therefore, the simulator can actually play just over 250,000 hands per second, before any speedups and before we utilize an AWS server.

Incorporating coroutines

Kotlin's primary solution for handling concurrency or parallelism is coroutines. Coroutines are much more lightweight than threads, and the programming model tends to be easier than promises, although coroutines have their own learning curve. With that said, the amount of refactoring is pretty small. We can compare the initial simulation starting code:

fun runSimulationDirect(loops: Int, gamesPerLoop: Int, cardCountTest: Pair<Char, Int>) {

    IntRange(1, loops).forEach { tblId ->

        val players = Simulation.getPlayers()
        val (tommy, dave, albert) = players

        val dbStatements : DBStatements = DBStatements()
        val d1bjTable = if (!cardCountTest.first.isLetterOrDigit()) Simulation.createTable(tblId, dbStatements) else
            Simulation.createTable(tblId, dbStatements, buildTestShoe(tblId, cardCountTest.first, cardCountTest.second))

        IntRange(1, gamesPerLoop).forEach { i ->
            makeBets(i, d1bjTable, tommy, dave, albert)
            d1bjTable.deal()
            Simulation.playHands(d1bjTable)
            d1bjTable.pay(i.rem(50) == 0)
        }

        Simulation.logLoopResults(players, tblId, d1bjTable.shoe.name, dbStatements)
    }
}

with same function incorporating coroutines:

suspend fun runSimulationParallel(loops: Int, gamesPerLoop: Int, cardCountTest: Pair<Char, Int>) {

    withContext(Dispatchers.Default) {

        IntRange(1, loops).forEach { tblId ->

            async {
                val players = Simulation.getPlayers()
                val (tommy, dave, albert) = players

                val dbStatements : DBStatements = DBStatements()
                val d1bjTable = if (!cardCountTest.first.isLetterOrDigit()) Simulation.createTable(tblId, dbStatements) else 
                    Simulation.createTable(tblId, dbStatements, buildTestShoe(tblId, cardCountTest.first, cardCountTest.second))

                players.forEach { it.reset() }

                IntRange(1, gamesPerLoop).forEach { i ->
                    makeBets(i, d1bjTable, tommy, dave, albert)
                    d1bjTable.deal()
                    Simulation.playHands(d1bjTable)
                    d1bjTable.pay(i % 50 == 0)
                }

                Simulation.logLoopResults(players, tblId, d1bjTable.shoe.name, dbStatements)
            }

        }
    }
}

As you can see, there is very little difference between the two methods. The parallel version is defined as a suspend function, other than that we only need to add the withContext loop and the async loop. The only other change is in our main(), where we decide which method to call:

if (!parallel) runSimulationDirect(loops, gamesPer, it)
else runBlocking {
    runSimulationParallel(loops, gamesPer, it)
    println("parallelling")
}

And here, we need to surround the suspend function inside a runBlocking block, which acts as the portal between standard and suspendable functions.

When we utilize the coroutine version, our 4 million games execute in just 25.7 seconds. This is now 155,000 Cheat® hands per second, or 467,000 total hands per second. We haven't used any AWS resources, this is just the local PC, and we are executing more than 20x as many hands as our Python simulator on the same PC, and even faster than we were able to max out using the 32-core AWS server. Compiled code for the win!

Utilizing AWS

As I mentioned just above, it's very simple to run our application directly inside IntelliJ, using the built-in console window as our output or logging source. Once we want to run it somewhere else, however, we need to have a way to set some options, such as the number of games to simulate, what type of logging (every hand, or results only). The clikt library makes it simple to add such parameters, set defaults, to add a help menu, and basically to make your console app look polished and preofessional. JetBrains (the company behind the Kotlin language, and IntelliJ) provides kotlinx.cli, but clikt covers just about every type of parameter or option.

I already had a main() method, and when running inside the IDE, I would just edit the code directly to change the number of games or loops, or change logging options. For command-line execution, I found it easiest to create a second method called mainCommandLine() where all of those parameters were set (or used the defaults) via clikt. So once again, very little refactoring needed. When running in the IDE, I still use the initial main(), but mainCommandLine() is the entry point for any code I specifically build and distribute to any other server, like the AWS ones.

Quick side note: another option for enabling a flexible set of options is to use an external config file, and read it in on startup with a library like hoplite which I've written about before. However in this scenario, the vast majority of the time, the only options I'm changing are the number of loops and games, so clikt is simpler, allowing those options to be changed straight from the command line and not having to edit an external file.

To actually build an executable JAR file that's easy to deploy remotely, the shadowJar plugin for Gradle is a must. No worries about a classpath or uploading dependencies, everything gets packaged into a single file. This also helps with versioning, you can always be sure that whatever version of a dependency is being used during devlopment will be the same version packaged into the shadowJar file.

The previous article provides a reasonably comprehensive walkthrough of setting up an AWS EC2 Launch Template, and a local SSH client able to connect easily. No need to repeat it, but it should help you get set up if you don't already utilize EC2. The only additional step is to make sure you have a recent JDK, as I believe the Anaconda distribution comes with 1.7. You likely want to install 11 or higher. AdoptOpenJDK is a great source, and they make it pretty easy, by following their directions.

Once we've launched an AWS server instance, we can open the Bitvise SSH client and log in automatically with our client key. Open an SFTP window, navigate to the project folder (or create one) and drag our compiled JAR file to the server. Open a console window and navigate to the project window, and we can now run our code. In my case, with the command-line parameters for specifying the number of loops and games, the launch command is java -jar d1bj.jar --loops 40 --games 100000 | tee ./d1bj.log. Pretty self-explanatory, the tee command allows our program output to be written to the screen and also to the specified log file. This is preferable to the simple > redirector, since we can still follow our program's progress as it runs.

With the power of a 32-core server, our 4 million games completes in just 6 seconds. To get better speed readings we need to increase the number of games. 100 loops of 1 million games each only takes 121 seconds (2 min 1 sec) and 256 loops of 1,953,125 games each (500 million) completes in 9.8 minutes. That's 850,000 Cheat® hands per second, but again, this simulator runs with 3 players at the table, only one of which is always playing Cheat®. In total, our Kotlin simulator is plowing ahead at 2.55 million hands per second, or about 9.18 billion hands per hour. Our Python simulator topped out at about 240,000 hands per second, so this compiled Kotlin version is about 10x faster, when both are running on a 32-core EC2 server utilizing multi-core concurrency. Keep in mind that when using Spot Instances, these servers are costing about 52 cents per hour, so we could run about 17 billion hands for about one dollar. That's pretty impressive.

Unfortunately, despite writing two simulators and getting matching results, and all of the work refining and speeding them up, I'm personally not a Registered Independent Testing Laboratory in Nevada or any other state. To get a new table game approved by state gambling regulators, the game's math must be certified by one of these companies. The good news is that because I had built these models, verified their results against each other, and calibrated the conventional blackjack players against known results, I had a great deal of confidence that the testing labs would agree with my calculations. The bad news is that, despite that, it's still mandatory to pay their hefty fee and wait almost 8 weeks to get certified results to submit to a gaming regulator. Of course, their team of Ph.D.'s and computer scientists responsible for reporting on my game agreed with my models, down to two decimal points, matching a House Advantage of 0.67%.

One point to clarify, in both of these articles, we are maximizing speed by eliminating logging. Both programs write out their total hand counts, dollars won, and house advantage calculations, but neither records the hand-by-hand results. Generally, we're doing that early in the development cycle to verify proper gameplay and that the program is simulating the correct player decisions, and we usually can do that with just a few thousand hands. The Kotlin program has a logEveryHand option which writes every card of every hand (and every payout, dealer hand, etc) to an H2 Database, while the Python program can write every hand to a log file (it could use a SQLite DB, I just never needed to do so), but in the example runs in these articles, those options are turned off.