from Guide to Hacking on Nov 5, 2023

How to be a "good" AI-powered debugger

One common question is: Will ChatGPT render coding obsolete? The truthful answer is — yeah, kinda. At the very least, it obsoletes coding as we know it. However, the need for code doesn't go away.

We still need coders, but all coders — even junior ones — will need to learn how to curate, edit, and review code instead of writing from scratch. In this post, we'll discuss the core elements of a good AI-powered coder and what to look out for as a code curator.

The flip side is true too: Coders can all benefit from AI-generated code, to spend less time typing and more time making higher-order decisions about the code. This is a significant productivity boost, in and of itself.

How code-less coding works

In theory, asking ChatGPT to write some code is straightforward:

  1. Prompt ChatGPT with the problem setup. Ask it to generate a code sample with certain specifications.
  2. Copy the generated code back into your codebase, and run.

    1. If it succeeds, then we're done.
    2. If there's an error, copy and paste the stacktrace directly into ChatGPT. Repeat step 2 until there are no more errors.

At a surface level, it seems coding is "solved". Unfortunately, with this approach, your prompt is lacking a lot of much-needed context. Even just 500 lines of omitted code could mislead ChatGPT to produce a suboptimal code modification. Maybe an already-implemented function is partially re-implemented elsewhere, or an abstraction barrier is torn apart.

Someone needs to have context — either you or the model. Issues like this one aren't insurmountable, but these are aspects of AI-powered coding that need human supervision and curation.

Below, we'll talk through a general coding principle, then discuss specifically how to exemplify that principle even when working with models such as ChatGPT. For issues like context above, the predominant AI-powered workflow is extremely prone to poor coding practices.

Principle: Fix the cause, not the symptom

There are many ways to silence an error or exception in your codebase, and the worst of these ways can degrade — not enhance — your codebase. These "fixes" are what breed unwieldy, unpleasant code, and unfortunately, a ChatGPT prompt with limited context leads to exactly this. First, let's understand properties of desirable and undesirable bug fixes.

In short, bug fixes should come after a thorough understanding of what's causing the issue. If you don't know why it's broken, you won't know why a fix works — if at all. And it will always come back to bite you. Even if these fixes handle errors, successive monkey patches start to eat away at your codebase's scalability and maintainability.

We talk about debugging thoroughly and quickly in How to debug black boxes. In short, you can debug faster by isolating as many components as possible — for example, by bisecting commits to isolate changes or mocking objects to isolate parts of a library.

Tip: Curate context for AI

Let's see this misdirection from AI in action. Below, we'll plug in two programs into ChatGPT, and ask it to help us debug. In both scenarios, we'll show a prompt with insufficient and then sufficient context. The idea is to get a rough idea of what context you and AI both need to correctly diagnose and fix the issue.

Example. Say you're writing a command-line program to download soccer game results. After running the program, you get the following traceback.

Traceback (most recent call last):
  ...
  File "path/to/file.py", line XX, in add_score
    stats[team] += num_goals
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'

You find that the relevant part of the program looks like the following.

def add_score(stats, team, num_goals):
    stats[team] = stats.get(team, 0)
    stats[team] += num_goals

We clearly need to handle this incorrect type. We expected an integer and instead got a None type. There are two possible approaches:

Approach #1. One way is to handle the None behavior explicitly; if the data type for num_goals is not an integer, don't do anything; early exit from the function and call it a day. Alternatively, since we're incrementing subtotals by num_goals, we could set num_goals to 0 if it's got an invalid datatype, without altering the function's behavior. Here's an example fix by hand:

def add_score(stats, team, num_goals):
        if num_goals is None:
        return
    stats[team] = stats.get(team, 0)
    stats[team] += num_goals

Let's plug the above into ChatGPT and see what it suggests, for a bug fix. We prompt our model with the above snippet and traceback; the returned program, according to our chat log, is the following.

def add_score(stats, team, num_goals):
    if stats.get(team) is None:
        stats[team] = 0

    stats[team] += num_goals

ChatGPT assumed stats[team] is None, and above in our manual fix, we assumed num_goals is None. These are both reasonable assumptions, but no matter how hard I tried, I couldn't get ChatGPT to realize that num_goals could be the problem. In any case, ChatGPT doesn't properly debug the issue for us, so let's move on to fixing this with a better prompt.

Approach #2. If we address the root cause directly, our bug fix can potentially be much simpler: Set a default integer value for that optional flag, like below.

parser.add_argument('--num-goals', default=0)

We only knew this because we dug deeper to find the original definition of num_goals. Let's prompt ChatGPT again, but this time, with the additional context we used to make a more robust fix. According to the chat log, ChatGPT identified the same root cause — that an optional argument wasn't supplied — but provides a different fix — enforce an int type and give up if the user doesn't supply a value.

parser = argparse.ArgumentParser()
parser.add_argument('--num-goals', type=int)  # Specify the type as integer
args = parser.parse_args()

# Check if args.num_goals is not None before passing it to add_score
if args.num_goals is not None:
    add_score({}, 'hello', args.num_goals)
else:
    print("Please provide a value for --num-goals.")

With that said, ChatGPT's approach effectively makes --num-goals a required argument, which is already supported by the argument parser. If --num-goals is rewritten to be mandatory, the snippet above should instead be the following, to use argparse's required argument feature.

parser = argparse.ArgumentParser()
parser.add_argument('--num-goals', type=int, required=True)

So now, ChatGPT's debugging has really failed us in two ways rather than just one:

  1. In the example for Approach #1, ChatGPT provided a localized fix that addresses only the symptoms, leading to potentially redundant code down the line. This wasn't really the model's fault though, because we didn't provide sufficient context.
  2. In the example here for Approach #2, ChatGPT actually identified the parser to be the root cause but then proposed a solution that was overly verbose, by re-implementing features already available in a Python built-in library.

Ignoring the second failure for now, the trick is to provide context to AI. Even if the fix isn't the cleanest, it at minimum could work and identifies a set of possible root causes to investigate. If you dig through the last ChatGPT chat logs, you'll find that even with the right context, the model actually incorrectly identifies the root cause but at least suggests parsing as the issue.

There's some finesse in understanding what context to provide to AI, when debugging. Too much context may simply be cumbersome to collect and prompt — if it doesn't already exceed the model's limitations on context size. For example, here are a few suggestions:

Naturally, the above tips are useful for you as a developer to find the bug yourself. However, as we saw above, this is also important context for the model to leverage, as a part of its debugging suggestion.

Takeaway

Above, we touched on a number of different factors that influence code cleanliness — re-implementing features from a standard library, addressing type errors as soon as possible, among others — but they all share one characteristic: Clean code reduces redundancy. There are limits to this that we'll discuss later, but in short, reduce redundancy by:

So in conclusion, working with AI can certainly produce working code that addresses the error you presented. However, it's not always perfect, and the true value of AI is it's ability to provide suggestions and directions. In our examples above, AI-generated code fixes code but isn't perfect; it's our job to ensure AI-generated code reduces redundancy just as a quality coder would. More generally, it's your job to curate AI-generated code.


back to Guide to Hacking