from Guide to Hacking on Jan 28, 2024

Designing the Tic-Tac-Toe from hell

Tic-Tac-Toe is definitely a time-tested puzzle, apparently having existed since ancient Egypt according to an ethnomathematician named Zaslavsky. For most of us, it's at least been an integral part of our childhood years.

However, there's one critical and famous downfall to Tic-Tac-Toe: Optimal players will always end in a standstill, with the game coming to a draw as both players fill the board without a single 3-in-a-row.

Goal: Let's make victory possible

Partly to address this issue, many Tic-Tac-Toe variants surfaced over the years, with various changes to the game's objective, board size, and even the movement mechanics. However, what if we didn't need more complexity? What if we could make a tiny tweak, so that Tic-Tac-Toe rewards the best players with victory, even against an optimal AI?

It turns out that we can: All we need to do, is to change the starting state, so that even against an optimal enemy, the first player could guarantee victory if they played cleverly.

"Tic-Tac-Toe evolved" is not new

There are actually a large number of twists on the traditional Tic-Tac-Toe concept. In short, these Tic-Tac-Toe variants increase the complexity. With orders of magnitude more states and actions, there are more possibilities in the game — and ideally, more fun in the game too.

Make the board bigger: Quixo expands the board to a 5x5, Ultimate Tic-Tac-Toe expands the board to a 9x9 by composing nine tic-tac-toe grids in a 3x3 super-grid, Gomoku expands the board to a 15x15, and Unrestricted n-in-a-row¹ expands the board to an infinite one.
Change movement mechanism: In Ultimate Tic-Tac-Toe, each player's moves determines which smaller board the other player can play in². In Quixo, players make moves by sliding entire rows or columns³. In Notakto, both players use the same mark — e.g., both players use "X".
Modify the objective: Quixo and Amöba require a 5-in-a-row to win, Unrestricted n-in-a-row requires an n-in-a-row (as the name suggests), and Toe-Tac-Tic and Notakto inverts the objective so that players avoid making 3-in-a-row.

The above are all creative variants of Tic-Tac-Toe — they vastly change the dynamics of the game. However, the complexity of the games above are their downfall: The more complex, the harder it is to explain. I liked the simplicity of the original Tic-Tac-Toe, so let's honor that and restrict our attention to Tic-Tac-Toe modifications that are simple. This rules out any modifications that change movement mechanisms.

Since the alternative objectives and larger game boards are pretty simple to explain in a single sentence, we'll allow that. However, there's a different issue with these variants, and even the original Tic-Tac-Toe: These games assume a sub-optimal opponent to win.

Tic-Tac-Toe itself famously always ends in a draw when both players are playing optimally. In fact, Wikipedia even describes the optimal strategy for both sides in its Tic-Tac-Toe article. In this way, a player with an optimal opponent can at best tie and at worst lose.
Notakto will always result in victory for the first player on an odd number of boards and for the second player on an even number of boards. Like with Tic-Tac-Toe, the Notakto Wikipedia article describes the optimal strategy. As a result, a player an optimal opponent and the ‘wrong' number of boards will definitely lose.
Unrestricted n-in-a-row, according to Erde et al⁴, can be forever stalled by the second player if n ≥ 8.

In some of these scenarios, the game's setup alone determines the game's outcome, assuming your opponent is moving optimally. I'm interested in a different set of scenarios: where even if your opponent plays optimally, you could force a victory. This is where some of Tic-Tac-Toe's variants present a solution.

For Notakto, even with an optimal opponent, you could force a victory with the ‘right' number of boards. The unfortunate part is that the optimal Notakto strategy is too simple.
For Unrestricted n-in-a-row, you can force a victory or the first player if n ≤ 4.

These variants, where the player can achieve victory despite an optimal opponent, are the kinds that interest me. The question is: Can I give Tic-Tac-Toe the same property? Without changing the board size, movement mechanisms, or objectives.

How many ways can you win?

Tic-Tac-Toe is well-known as a "solved" game. This means that it's trivial to explore all possible game states and make optimal algorithms play as either side. Fortunately, this means that I can actually enumerate all possible states.

Let's do just that, and understand the probability of winning from any possible Tic-Tac-Toe state in the game. Given a tree of all possible moves, here's a quick snippet that counts up the total number of terminal states and their outcomes.

tictactoe/puzzle.py



length_to_moves = defaultdict(set)


def set_win_counts(node):
    """Mark any paths where one winner is guaranteed"""
    if node.winner > -1:
        node.counts[node.winner] += 1
    for child in node.children.values():
        set_win_counts(child)
        for i in range(len(node.counts)):
            node.counts[i] += child.counts[i]
    if (node.counts[0] == 0 or node.counts[1] == 0) and node.counts[2] == 0 and sum(node.counts[:2]) > 0 and node.children:
        length_to_moves[len(node.moves)].add(node.moves)  # add all cases that have definite victories


#######

Using the script, we can now tally up the number of each possible outcome in a match of Tic-Tac-Toe. There are 255,168 possible outcomes⁵. Below, we use ‘O' to denote the first player:

===== Outcomes =====
 * Player 'O' win: 131184
 * Player 'X' win: 77904
 * Both tie: 46080

These numbers are corroborated by a separate article on "Amöba"⁶ that independently computed the number of states. This gives me confidence that my game tree is correct. Using these counts, let's look at the probability of winning or losing, from certain boards. Start with this board:

['X', ' ', ' ']
['O', 'O', ' ']
['O', 'X', ' ']

By inspection, we can see that O is definitely about to win. If X stops a middle-row victory by O, O can simply place a piece in the top-right — and vice versa. However, from this state, our game tree produces some un-intuitive outcome counts⁷.

===== Outcomes =====
 * Player 'O' win: 12
 * Player 'X' win: 2
 * Both tie: 4

This seems incorrect; we know the X player is definitely about to lose. Why is there any outcome where they tie, much less where X wins? The problem is that to compute all of the possible game states, we considered all possible moves. We even considered moves where O next places a piece in the bottom-right.

These aren't moves that any sane player in the right mind would take. So, let's recompute our game tree, with the two following assumptions of a sane player:

If a 3-in-a-row is possible, take it. Don't miss an obvious opportunity to win.
If the opponent can win in the next move, make a move that blocks their victory. Don't blindly let the opponent win.

Now, we recompute our game tree assuming that players will always follow the above two rules, and this is what we get.

===== Outcomes =====
 * Player 'O' win: 6
 * Player 'X' win: 0
 * Both tie: 0

This is looking much better. Now, we see that player ‘O' is guaranteed to win, just like our intuition says.

Hard Tic-Tac-Toe states exist

Using this game tree, we can now look for states where one player forces a victory. There are unsurprising states where this happens — but there are also more surprising states, like this one:

['O', ' ', 'X']
[' ', ' ', ' ']
[' ', ' ', 'O']

Let's try playing this game. Very quickly, we'll see how this leads to a definite victory.

# X *must* place a value in the center to stop O
['O', ' ', 'X']
[' ', 'X', ' ']
[' ', ' ', 'O']

# O *must* place a value in the bottom left to stop X
['O', ' ', 'X']
[' ', 'X', ' ']
['O', ' ', 'O']

Now, we see the definite victory state for O. If X stops a bottom-row victory by O, O can simply claim a left-column victory — and vice versa. Interestingly, it only took three moves to guarantee a definite victory for O. There's more than just one definite-victory state too.

Of the 9*8*7 = 504 possible three-move games, 80 of them guarantee definite victory for the first player. This is 15.8% of three-move states that guarantee victory.
Of the 9*8 = 72 possible two-move games, 44 of them could lead to a definite victory state for the first player, as long as the first player picked optimally. This is 61.1% of two-move games where the first player has a chance to guarantee victory.
After two moves, there are seven remaining places to pick from. From the 44 unique two-move states, probabilities of making a move that guarantees victory range from 14.3% (only 1 of 7 moves) to 42.9% (3 of 7 moves).

Now, by picking this initial game state according to its guaranteed-victory probability we can control the difficulty of the puzzle.

The guaranteed-victory states

To share the 80 three-move states above, I'll need to share one way to represent the board state above, compactly. One approach is to first label all of the cells in the Tic-Tac-Toe grid 0 through 8, like this:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]

Then, we represent moves from both players with a sequence of numbers, such as 028. The odd-numbered numbers correspond to moves made by the first player, the ‘O'. The even-numbered numbers correspond to moves made by the second player, the ‘X'. Drawing this on a board, we get the state from above.

['O', ' ', 'X']
[' ', ' ', ' ']
[' ', ' ', 'O']

Using this, we can now represent any game state with a sequence of numbers. Using this mechanism, I can now write the 80 definite-victory states using the following notation:

023,028,051,054,056,061,068,072,073,074,107,127,132,150,160,167,182,187,205,206,231,234,238,270,274,275,281,286,305,316,320,325,365,370,385,386,416,418,432,438,450,456,470,472,502,503,518,523,563,568,572,583,602,607,613,614,618,650,654,657,682,683,701,706,721,728,738,756,761,781,814,815,816,820,827,832,834,837,860,865

These states are the shortest definite-victory states possible, using the simple agent rules above: (a) Complete 3-in-a-row when possible and (b) when the opponent has a 3-in-a-row opportunity, block it.

By picking this initial game state according to its definite-victory probability we can control the difficulty of the puzzle.

Without modifying the game objective, movement mechanics, or even the board size, we can now both (1) increase and control puzzle difficulty and (2) ensure victory is possible even against an optimal player. This accomplishes the simple Tic-Tac-Toe modification we set out for.

Takeaways

With that said, there are only 28 possible initial game states to pick from. This is even ignoring symmetries in the game, so we really only have a small pool of possibilities to take from. Fortunately, even subject to the same constraints above — same board size, same movement mechanics, and same objective — we could still fix this.

Tic-Tac-Toe has a large number of unexplored states that are "impossible" to reach, given the game's current rules⁸. This means there's a rich pool of possible game states that we can further explore, for some interesting Tic-Tac-Toe challenges.

This blog post has gotten fairly long, so we'll have to explore these un-explored states another time. For now, we now have a much more rewarding and challenging Tic-Tac-Toe, even against the best of puzzle-playing AI.

←Prev Our AI overlords suck at 2048

← back to Guide to Hacking

Unrestricted n-in-a-row is described in Wikipedia's entry on Tic-Tac-Toe variants and in more detail, with a corresponding analysis, in "Combinatorial Games: Tic-Tac-Toe Theory" by Jozsef Beck on page 59. ↩
In Ultimate Tic-Tac-Toe, the player's move's relative location in the smaller board determines which of the smaller boards the opponent can play in. For example, if the first player places a mark in the bottom-left of any smaller board, the second player must now place a mark in the small board located in the bottom-left. ↩
In Quixo, players make moves by removing pieces — blanks, or marked with their symbol — from the periphery. The piece is then inserted back in at the periphery, but with the player's symbol on it, which slides either a row or column of pieces to fill the original gap. Figure 4 in a Quixo rulebook explains it nicely. ↩
Erde et al in "An n-in-a-row type game" published by the Electronic Journal of Combinatorics specifically cites another paper for proving this "known" result. However, I couldn't find the original text for the cited text, which is T. G. L. Zetters. 8 (or more) in a row. The American Mathematical Monthly, 87(7):575–576, 1980.. The only table of contents I could find, which includes the American Mathematical Monthly for Volume 87, Number 7, published in 1980 doesn't include a publication from Zetters. ↩
To compute the number of each possible outcome in the standard Tic-Tac-Toe game, use python puzzle.py --agent any. ↩
Amöba is the Hungarian name given to Tic-Tac-Toe. ↩
To compute the number of each possible outcome from this initial state, use python puzzle.py --agent any --start 30674. ↩
For example, any board with 3 O's and 1 X is impossible. Clearly, since players alternate, the number of O's and X's are at most one off from the other. ↩