In part I of this post, I described how to detect and locate a Sudoku grid in the image, and then finding all the grid locations where the numbers and gaps are.
This post is therefore about recognizing the numbers and gaps, solving the Sudoku (i.e. finding the numbers that should go in each gap), and finally project the solved numbers in the image gaps (aka augmented-reality style).
You can download the full code from GitHub, here I’ll just highlight the main parts to understand the algorithm. Please note that the proposed algorithm is not the optimal in a lot of senses, since it’s been designed as an academic tutorial that one can understand without a lot of knowledge about computer vision. This has left out some powerful but more advanced techniques.
1. Digit and gap recognition
Now that we have located the grid in the image and all its spots, let’s isolate every sub-image containing the digits to scan them using an OCR.
First of all, we isolate and crop the whole image of the Sudoku Grid.
Here how an original frame looks like in practice:
And here the automatically cropped image showing the Sudoku grid:
We now proced to binarize the image using local thresholding, which is robust to local shadings. We could apply the OCR directly to this image and let it decide the binarization, but we found that binarizing the image ourselves worked better.
Here the grid after binarization:
As we can see, the grid is not perfectly aligned. That’s why we reduced the size of the small boxes in previous steps. We can now cut the thresholded image on each of the small boxes detected in the previous post and here we have some results:
Now that we have the binarized sub-images of the grid, let’s run the OCR on them.
Here the results of the recognition of the numbers present in the image.
2. Solving the Sudoku
We now have all available digits recognized and we also now where the gaps are, so the next step is to solve the Sudoku.
To do so, we create a class Sudoku, that mainly contains all the possible digits that are still possible at each grid location and a set of constraints that define the rules of Sudoku (there cannot be repeated numbers in a row, column, etc.).
The first step is therefore to ingest all recognized digits into the Sudoku class, and then call to the solve function.
The global call to the code would therefore be:
The set_value() function reduces the possible values of that location to only one value, so that grid location is solved.
We will know that a Sudoku is solved, therefore, when all locations have only a single possible value, as the code below checks.
So, now that we know whether the Sudoku is solved at a given stage, and we have set all the initial recognized digits, we would proceed by scanning all constraints iteratively and removing all solved numbers from the rest of positions possible values. After some iterations, we will be reducing the possible values in each location, until only one possibility remains.
The global algorithm to solve the Sudoku would therefore be the following.
This solving strategy is the most basic possible, and will therefore only solve the easy puzzles. In other words, we could keep end up iterating through all constraints without being able to change the possible numbers on each grid.
We would therefore be left with some positions with more than one possible number.
A possible way to proceed would be to do branching, that is, to take a location with a few possible numbers and make a guess among them.
Then try to solve imposing that guess and if we are not able to solve keep trying with another guess.
Obviously, this branching can also be nested.
We leave this solution to the reader :).
3. Implementing constraints using Object-Oriented programming
Let’s implement the classes to impose the constraints in the Sudoku.
In generic, a constraint is a set of positions in the Sudoku grid, where there cannot be duplicate digits, for example a row or a column.
Let’s therefore create a generic class (subclass of Sudoku) that implements the algorithm to enforce this constraint.
We will then need to particularize the class to three types of constraints: rows, columns, and blocks.
They will be different only in how we define the coordinates covered by the constraints, but will share the code of how to impose them.
In Object-Oriented programming, we can implement this idea with a generic parent class that we will then specialize by class inheritance into three new classes.
Let’s delve into creating the generic parent class that implements a constraint.
As you can see, it’s mainly a list of coordinates and a function to impose this constraint given a specific state of the Sudoku grid (you can see the full implementation in the code).
Every time we impose this constraint, we will be removing all digits that are solved in a certain position from the rest of locations.
Now that we have the generic constraint, we need to provide easy ways of filling the coordinates of the constraints, to avoid having to manually define them all.
We exemplify it, for instance, with the row constraint RowConstraint.
We inherit from the generic class Constraint and we specialize the constructor to take the specific number of row and fill the coordinates accordingly.
We also initialize the identifier string for debugging.
Note the use of this-> to get access to the fields defined in the parent generic class.
With this class structure, the definition of the full set of constraints is reduced to the following constructor in the Sudoku class.
4. Agumented-reality result display
Once the Sudoku is solved, the only remaining step is to reproject the solved digits into the gaps of the original Sudoku.
You can find the code from this blog post in GitHub.
And here the link to the final result video. If you liked this post or used the code, why don’t you say hi in the comments? :)