from Guide to Hacking on May 21, 2023
How to level up your coding tutorial
Coding tutorials are difficult to follow. Where is the referenced code snippet in your large project? What changed in the code snippet, between the previous step and the current one? Even worse, there's no way to check your code — no "step 2" version of all the project's files.
Fortunately, there are simple solutions for these challenges. Some publications spend extensive, manual effort curating content to avoid these problems. However, this manual work is laborious, making coding tutorials1 difficult to maintain, for a number of different number of reasons. At the surface, there are several apparent issues with maintainability:
- Multiple sources of truth: Some tutorials may include snippets of the code in markdown, as well as fully-functional but separate source code. This redundancy may cause incongruous demos and tutorials in the best case; in the worst case, tutorial code may break and demo code may become brittle.
- Untestable "code": Other tutorials may write un-testable code directly in a markdown file. "Un-testable" in this case means that we in theory can't execute the code in the markdown file — at least, not natively. Even if we could naively extract all code snippets, some snippets may actually represent code changes or be repeated.
- Clunky annotation: To annotate particular parts of the code and explain each line of code, tutorials reference (1) line numbers. However, these are brittle, as the code may change. (2) specific variable names or control logic, but again, these may change as the code evolves. You may see a large chunk of unannotated code, followed by bulleted line numbers and explanations, for example, like here on RealPython's "Create and Modify PDF Files".
- Snippets without context: Code snippets are difficult to understand or even indent without the proper context. A 3-line snippet may actually belong in a function, inside a for loop. Perhaps inside a nested if condition for a while loop. However, copying lines from the previous code snippet is also redundant, so leaving out and adding context are both poor decisions.
However, these are only symptoms of several root causes we'll detail below. In this post, we'll then cover several key ideas that can address these root causes, to make writing coding tutorials much less painful. This post isn't about a particular product. Instead, our goal is to establish a set of principles that can guide any tooling that aims to solve these issues.
If we dive deeper, we'll find a pair of root causes — namely, two critical facets of coding tutorials that are difficult to actually implement:
- Representing code changes: A coding tutorial ideally builds up in complexity, from a simple but functional snippet, to a complex and fully-featured one. However, representing code changes is difficult. How do we represent diffs manually? Or, do we compute diffs automatically and store multiple copies of redundant code?
- Representing code states: At each step in the tutorial, you may want or need to browse all the files at that particular step — possibly to test it as a writer, or just to check your code as a reader. However, this isn't possible in the current day, and readers simply have to guess, restart, or work backwards from a final version of the code — if one is even provided.
Neither of these solutions are easy to do out of the box with vanilla, naive solutions. Instead, most coding tutorials work around these two issues with two, very common approaches:
- Build from start to finish: Since code changes are difficult to represent, a large number of tutorials simply build a project from start to finish, introducing code from line 1 to line N, such as RealPython's "Create a Flask Application with Google Login". You could, albeit with some indentation fixing, extract all the code snippets from start to finish and execute the code. This is acceptable for simple projects but less understandable for more complex ones. This is also a poor habit for readers to build.
- Download files at the end: Since the intermediate code states can't be represented, tutorials provide all the source code in a nice zip file at the end of the post. Then, the reader has to either (1) start over and retry the steps or (2) start from the final, working code and work backwards through the tutorial.
However, these monkey patches are simply due to the tough nature of the above two challenges: (1) showing code changes and (2) making browse-able code states.
There are two common approaches to writing coding tutorials, and each approach has its own set of workarounds for representing code changes and states.
Text in code: Use multi-line strings (a.k.a., heredocs) within a source code file. As we discussed above, these tutorials usually look like a completely finished file just with extra comments — in other words, the project does not build up in complexity and instead introduces the final version a line at a time, from the first to the last. There's no need for code states or changes because there's only one final form.
- Testable but no common renderers: The natural pro is that the code is now runnable and testable. However, there aren't commonly-available renderers to preview your tutorial. This makes the tutorial maintainable but less accessible to write.
- Versions in filenames: One common workaround for representing code states is to have multiple files, named with suffixes such as
file_v2.py. However, this results in duplicated code: make edits in v1 and you now have to copy those edits manually n-1 times, once for each subsequent step.
Code in text: Use code snippets in markdown. As we explain below, this format can represent code changes fairly well. However, they typically don't reflect any version of the code except the final one, with a separately collated final zip file.
- Common renderers but not testable: The pro is that markdown renderers are easily accessible, and markdown is supported in all sorts of productivity note-taking tools such as Notion. However, the code isn't directly testable, without manually copying the code into a source file.
- Most professional tutorial websites such as DigitalOcean, Smashing Magazine, and RealPython use this approach, putting writing clarity over maintainability. This makes sense, as most of their value is delivered upfront, at time of publishing.
- Syntax highlighting as a validator: Effectively all markdown renderers from Notion's built-in renderer to even hackmd.io has syntax highlighting. This is a standard that works for basic code validation. However, none of these solutions check that the source file, assembled across different code snippets, is a valid chunk of runnable code.
- Clear, manual diffs for code changes: DigitalOcean chooses to re-include all context and effectively represent changes with diffs, which you can see here highlighted in green on DigitalOcean's "How to Make a Web Application using Flask". The code snippets even include file names and ellipsis to denote indentation for neighboring lines of code. However, having written tutorials for DigitalOcean before, I know this process is very manual. The clarity comes at a cost of writers manually constructing diffs in a special markdown syntax.
- Re-publish instead of maintain: DigitalOcean's strategy here is particularly effective: Skip the maintainability issue by simply re-publishing for each version of the operating system, such as in their "How to Install Nginx on Debian 8". Notice in the dropdown at the start of the article, you can select which operating system you're working with. However, the effort of manually testing each tutorial is quite taxing on their editorial staff.
In short, both approaches have both their benefits and their drawbacks, and neither perfectly solves both challenges.
To develop the intuition for a solution, we can look to an existing, effective solution for representing code changes and states very clearly: the popular version control system git.
Fortunately, git and in particular github has largely figured out how to make browsing both time (browsing commits) and "space" (files within a commit) understandable. Building off of this, view a tutorial as simply a natural language description of how a git repository evolves. Along these lines but more specifically, tutorials are annotated and explained commit comments, visualized alongside the corresponding commit diffs.
This leads us to three principles for a solution:
- The reader should always understand where and why a code change is taking place.
- The coder should maintain a source of truth in code that is separated from text.
- The writer should tie explanations to commits and specific diffs in a commit.
- The system should interoperate with common, existing formats — both importing from and exporting to conventional formats.
Let's now move onto the proposed approach now.
The canonical representation of your tutorial is text-first, with a markdown tutorial containing the tutorial's main text. However, there are a few critical changes:
- Code references: Rather than write code snippets directly in markdown, all snippets are actually references to code, such as
- Semantic references: Code references are not line numbers but "semantic" tags. Instead of specifying
camera.py#def:initialize. This way, code references are more robust to code changes.
The code itself is organized into a git repository. This structure allows us to reference diffs between versions, using
camera.py#def:initialize;v1-v2. For the developer, there are a few key benefits:
- Testable: The project can be tested easily at every step, as there are explicit files to run.
- Maintainable: Changes can be propagated across versions of the codebase, and we leverage a suite of existing version control utilities.
- Bonus features: With automatically computed diffs, we can compute the "explanation coverage" for a tutorial. In other words, are all changed lines between versions explained or referenced in the tutorial?
The tutorial renderer additionally does more work, with a few additions to make browsing the tutorial easier:
- More context: Every code snippet can be expanded to include more context in the file it comes from. By default, select all and copying a snippet will only include the original snippet.
- Browse any state: You can pull up a popup of any snippet to browse all files for that step of the tutorial. This allows you to compare your code with the tutorial's reference code.
- Render diffs: Diffs between versions of a file can be rendered automatically, without manually finding and specifying diff renders.
As a bonus, existing tutorial formats can also be converted fairly easily into this canonical format. Once converted, writers and readers can both reap the benefits.
Note I'm intentionally ignoring other types of code-related text, such as library documentation. Docstrings and doctests are fantastic ways to generate documentation and simultaneously keep code maintainable. I'm specifically interested in beginner-friendly, tutorial-style text. ↩
Want more tips? Drop your email, and I'll keep you in the loop.