from Guide to Hacking on Jan 28, 2024

Designing the Tic-Tac-Toe from hell

Tic-Tac-Toe is definitely a time-tested puzzle, apparently having existed since ancient Egypt according to an ethnomathematician named Zaslavsky. For most of us, it's at least been an integral part of our childhood years.

However, there's one critical and famous downfall to Tic-Tac-Toe: Optimal players will always end in a standstill, with the game coming to a draw as both players fill the board without a single 3-in-a-row.

Goal: Let's make victory possible

Partly to address this issue, many Tic-Tac-Toe variants surfaced over the years, with various changes to the game's objective, board size, and even the movement mechanics. However, what if we didn't need more complexity? What if we could make a tiny tweak, so that Tic-Tac-Toe rewards the best players with victory, even against an optimal AI?

It turns out that we can: All we need to do, is to change the starting state, so that even against an optimal enemy, the first player could guarantee victory if they played cleverly.

"Tic-Tac-Toe evolved" is not new

There are actually a large number of twists on the traditional Tic-Tac-Toe concept. In short, these Tic-Tac-Toe variants increase the complexity. With orders of magnitude more states and actions, there are more possibilities in the game — and ideally, more fun in the game too.

  1. Make the board bigger: Quixo expands the board to a 5x5, Ultimate Tic-Tac-Toe expands the board to a 9x9 by composing nine tic-tac-toe grids in a 3x3 super-grid, Gomoku expands the board to a 15x15, and Unrestricted n-in-a-row1 expands the board to an infinite one.
  2. Change movement mechanism: In Ultimate Tic-Tac-Toe, each player's moves determines which smaller board the other player can play in2. In Quixo, players make moves by sliding entire rows or columns3. In Notakto, both players use the same mark — e.g., both players use "X".
  3. Modify the objective: Quixo and Amöba require a 5-in-a-row to win, Unrestricted n-in-a-row requires an n-in-a-row (as the name suggests), and Toe-Tac-Tic and Notakto inverts the objective so that players avoid making 3-in-a-row.

The above are all creative variants of Tic-Tac-Toe — they vastly change the dynamics of the game. However, the complexity of the games above are their downfall: The more complex, the harder it is to explain. I liked the simplicity of the original Tic-Tac-Toe, so let's honor that and restrict our attention to Tic-Tac-Toe modifications that are simple. This rules out any modifications that change movement mechanisms.

Since the alternative objectives and larger game boards are pretty simple to explain in a single sentence, we'll allow that. However, there's a different issue with these variants, and even the original Tic-Tac-Toe: These games assume a sub-optimal opponent to win.

In some of these scenarios, the game's setup alone determines the game's outcome, assuming your opponent is moving optimally. I'm interested in a different set of scenarios: where even if your opponent plays optimally, you could force a victory. This is where some of Tic-Tac-Toe's variants present a solution.

These variants, where the player can achieve victory despite an optimal opponent, are the kinds that interest me. The question is: Can I give Tic-Tac-Toe the same property? Without changing the board size, movement mechanisms, or objectives.

How many ways can you win?

Tic-Tac-Toe is well-known as a "solved" game. This means that it's trivial to explore all possible game states and make optimal algorithms play as either side. Fortunately, this means that I can actually enumerate all possible states.

Let's do just that, and understand the probability of winning from any possible Tic-Tac-Toe state in the game. Given a tree of all possible moves, here's a quick snippet that counts up the total number of terminal states and their outcomes.


length_to_moves = defaultdict(set)
def set_win_counts(node): """Mark any paths where one winner is guaranteed""" if node.winner > -1: node.counts[node.winner] += 1 for child in node.children.values(): set_win_counts(child) for i in range(len(node.counts)): node.counts[i] += child.counts[i]
if (node.counts[0] == 0 or node.counts[1] == 0) and node.counts[2] == 0 and sum(node.counts[:2]) > 0 and node.children: length_to_moves[len(node.moves)].add(node.moves) # add all cases that have definite victories #######

Using the script, we can now tally up the number of each possible outcome in a match of Tic-Tac-Toe. There are 255,168 possible outcomes5. Below, we use ‘O' to denote the first player:

===== Outcomes =====
 * Player 'O' win: 131184
 * Player 'X' win: 77904
 * Both tie: 46080

These numbers are corroborated by a separate article on "Amöba"6 that independently computed the number of states. This gives me confidence that my game tree is correct. Using these counts, let's look at the probability of winning or losing, from certain boards. Start with this board:

['X', ' ', ' ']
['O', 'O', ' ']
['O', 'X', ' ']

By inspection, we can see that O is definitely about to win. If X stops a middle-row victory by O, O can simply place a piece in the top-right — and vice versa. However, from this state, our game tree produces some un-intuitive outcome counts7.

===== Outcomes =====
 * Player 'O' win: 12
 * Player 'X' win: 2
 * Both tie: 4

This seems incorrect; we know the X player is definitely about to lose. Why is there any outcome where they tie, much less where X wins? The problem is that to compute all of the possible game states, we considered all possible moves. We even considered moves where O next places a piece in the bottom-right.

These aren't moves that any sane player in the right mind would take. So, let's recompute our game tree, with the two following assumptions of a sane player:

  1. If a 3-in-a-row is possible, take it. Don't miss an obvious opportunity to win.
  2. If the opponent can win in the next move, make a move that blocks their victory. Don't blindly let the opponent win.

Now, we recompute our game tree assuming that players will always follow the above two rules, and this is what we get.

===== Outcomes =====
 * Player 'O' win: 6
 * Player 'X' win: 0
 * Both tie: 0

This is looking much better. Now, we see that player ‘O' is guaranteed to win, just like our intuition says.

Hard Tic-Tac-Toe states exist

Using this game tree, we can now look for states where one player forces a victory. There are unsurprising states where this happens — but there are also more surprising states, like this one:

['O', ' ', 'X']
[' ', ' ', ' ']
[' ', ' ', 'O']

Let's try playing this game. Very quickly, we'll see how this leads to a definite victory.

# X *must* place a value in the center to stop O
['O', ' ', 'X']
[' ', 'X', ' ']
[' ', ' ', 'O']

# O *must* place a value in the bottom left to stop X
['O', ' ', 'X']
[' ', 'X', ' ']
['O', ' ', 'O']

Now, we see the definite victory state for O. If X stops a bottom-row victory by O, O can simply claim a left-column victory — and vice versa. Interestingly, it only took three moves to guarantee a definite victory for O. There's more than just one definite-victory state too.

Now, by picking this initial game state according to its guaranteed-victory probability we can control the difficulty of the puzzle.

The guaranteed-victory states

To share the 80 three-move states above, I'll need to share one way to represent the board state above, compactly. One approach is to first label all of the cells in the Tic-Tac-Toe grid 0 through 8, like this:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]

Then, we represent moves from both players with a sequence of numbers, such as 028. The odd-numbered numbers correspond to moves made by the first player, the ‘O'. The even-numbered numbers correspond to moves made by the second player, the ‘X'. Drawing this on a board, we get the state from above.

['O', ' ', 'X']
[' ', ' ', ' ']
[' ', ' ', 'O']

Using this, we can now represent any game state with a sequence of numbers. Using this mechanism, I can now write the 80 definite-victory states using the following notation:


These states are the shortest definite-victory states possible, using the simple agent rules above: (a) Complete 3-in-a-row when possible and (b) when the opponent has a 3-in-a-row opportunity, block it.

By picking this initial game state according to its definite-victory probability we can control the difficulty of the puzzle.

Without modifying the game objective, movement mechanics, or even the board size, we can now both (1) increase and control puzzle difficulty and (2) ensure victory is possible even against an optimal player. This accomplishes the simple Tic-Tac-Toe modification we set out for.


With that said, there are only 28 possible initial game states to pick from. This is even ignoring symmetries in the game, so we really only have a small pool of possibilities to take from. Fortunately, even subject to the same constraints above — same board size, same movement mechanics, and same objective — we could still fix this.

Tic-Tac-Toe has a large number of unexplored states that are "impossible" to reach, given the game's current rules8. This means there's a rich pool of possible game states that we can further explore, for some interesting Tic-Tac-Toe challenges.

This blog post has gotten fairly long, so we'll have to explore these un-explored states another time. For now, we now have a much more rewarding and challenging Tic-Tac-Toe, even against the best of puzzle-playing AI.

back to Guide to Hacking

  1. Unrestricted n-in-a-row is described in Wikipedia's entry on Tic-Tac-Toe variants and in more detail, with a corresponding analysis, in "Combinatorial Games: Tic-Tac-Toe Theory" by Jozsef Beck on page 59. 

  2. In Ultimate Tic-Tac-Toe, the player's move's relative location in the smaller board determines which of the smaller boards the opponent can play in. For example, if the first player places a mark in the bottom-left of any smaller board, the second player must now place a mark in the small board located in the bottom-left. 

  3. In Quixo, players make moves by removing pieces — blanks, or marked with their symbol — from the periphery. The piece is then inserted back in at the periphery, but with the player's symbol on it, which slides either a row or column of pieces to fill the original gap. Figure 4 in a Quixo rulebook explains it nicely. 

  4. Erde et al in "An n-in-a-row type game" published by the Electronic Journal of Combinatorics specifically cites another paper for proving this "known" result. However, I couldn't find the original text for the cited text, which is T. G. L. Zetters. 8 (or more) in a row. The American Mathematical Monthly, 87(7):575–576, 1980.. The only table of contents I could find, which includes the American Mathematical Monthly for Volume 87, Number 7, published in 1980 doesn't include a publication from Zetters. 

  5. To compute the number of each possible outcome in the standard Tic-Tac-Toe game, use python --agent any

  6. Amöba is the Hungarian name given to Tic-Tac-Toe. 

  7. To compute the number of each possible outcome from this initial state, use python --agent any --start 30674

  8. For example, any board with 3 O's and 1 X is impossible. Clearly, since players alternate, the number of O's and X's are at most one off from the other.