from Guide to the PhD on Feb 12, 2023

How to write a paper

Paper writing is black magic — some combination of word-smithing, storytelling, and figure doodling. Despite going through the motion several times as a mentee, I still found paper writing daunting, come time to lead a paper myself. This guide walks you through the process.

The start of a paper is not always clear cut. Maybe you have a half-baked set of experimental results, a partially-working method, and some idea of the problem statement. Is that enough? To know when you're paper-ready, check if you have the core ingredients — problem statement, insight, and minimum viable result — using the guide in Is my project paper-ready?

Say you have the core ingredients to start the paper writing process. Truth be told, the core arguments of a paper are not yet there, and our goal in this process is to uncover those arguments as soon as possible. This way, we can quickly iron out what additional experimental results, ablations, and related works we'll need to make a point.

Anatomy of a paper

Before discussing how, much less in what order, to write a paper, let's go over what a paper looks like. The main sections should sound familiar to avid paper readers.

Figures - Figures form the backbone of the paper, by making arguments that advance the paper's overarching argument. This is critical and has three implications:

Each figure should make an argument. Don't write figure captions with just a description, like "Classification accuracy on ImageNet". This doesn't tell the reader where to look or what to conclude.
Direct the reader's attention to some facet of the figure, such as "Lower computational cost - left along the x axis - and higher accuracy - higher on the y axis - is preferred".
Tell the reader what to conclude, such as "Our model — the blue line — achieves lower computational cost and higher accuracy at all points along the curve".

These arguments serve two main purposes for your paper:

Figures should tell the entire story: illustrate the problem, the insight, the method, experimental results, and ablations. Reading the figures alone should be enough to summarize the paper. Making Convolutional Networks Shift-Invariant Again exemplifies this well².
Figures should explain or clarify for an argument made in text. For example, the figure could contain an example for an abstract formulation in text. The critical part is to anchor the figure to a relevant portion of the text.

Creating good figures is an entire other topic, but at a high level, a strong figure makes an argument clearly.

Abstract - The abstract should contain a summary of the paper's main argument, including the problem statement, insight, hypothesis, and the main conclusion. Make this paragraph as broadly accessible as possible, assuming little to no prior knowledge. Wrap up the abstract with a few quantitative examples of your paper's impact; it should not read like an experiments section paragraph, so don't add more than 3-4 statistics. See an example abstract for Neural-Backed Decision Trees.

Machine learning applications such as finance and medicine demand accurate and justifiable predictions, barring most deep learning methods from use. In response, previous work combines decision trees with deep learning, yielding models that:

sacrifice interpretability to maintain accuracy OR
sacrifice accuracy to maintain interpretability.

We forgo this dilemma by proposing Neural-Backed Decision Trees (NBDTs). NBDTs replace a neural network's final linear layer with a differentiable sequence of decisions and a surrogate loss.This forces the model to learn high-level concepts and lessens reliance on highly-uncertain decisions, yielding both:

improved accuracy: NBDTs match or outperform modern neural networks on CIFAR, ImageNet and better generalize to unseen classes byup to 16%. Furthermore, our surrogate loss improves the original model's accuracy by up to 2%
improved interpretability: improving human trust by clearly identifying model mistakes and assisting in dataset debugging

Code and pretrained NBDTs are on Github.

Introduction - The introduction operates as an expanded version of the abstract, except with more detail and justification. Introduce any background needed to understand the paper's contributions in its entirety.

Re-motivate the paper with an expanded problem statement. Start broad and narrow down the focus to several technical problems. List 2-3 technical challenges, to explicitly address later.
Summarize your main insight, that enables you tackle all the aforementioned challenges. Explain how your insights solve the 2-3 technical challenges above.
Describe the method very simply, in just a few sentences. This should convey how you leverage the insight concretely. This sentence should start with "Using the insight, we…". Don't include any details. Include just enough to convey the gist of the method.

The introduction should then end with an explicit list of three contributions. Contributions summarize the takeaways of the paper. For example, the same paper above yields these three contributions:

Related Works - Cluster prior work in 2-3 groups, with a unifying idea that separates that cluster conceptually from your own.

State disadvantages. This conceptual difference should be directly linked to final performance in your method's favor. For example, you may cluster all prior works as "methods that utilize skip connections". Next, you state the disadvantage for using skip connections, during inference time — namely, increased memory consumption.
Or, not. Alternatively, you may simply cluster prior work and avoid articulating a disadvantage. Odds are, your paper will be matched with authors that you cite, so this is unfortunately the "safe" and recommended approach, to avoid irritating a reviewer.
Compare with the most similar. Regardless of which option you choose, make sure to pick the 1-2 most similar works and offer an intuitive explanation of the differences, as well as an explanation of why those differences matter. We discuss examples in Is my project paper-ready?

As a general rule of thumb, include 40-50 citations at least. Randomly including citations to reach this goal is unnecessary, but if you're proactively looking for highly-related work, you should generally hit this volume of papers.

Method - The method section should describe your architecture, losses, and/or training pipeline, depending on the actual contribution. Regardless of the contribution and focus, here are a few more general guidelines:

Introduce the intuition first. Convey the general gist of why your method is designed the way it is, then describe how you achieved that.
Keep descriptions general. Your methods description should be for the general case. Include examples only sparsely, and relegate examples to figures or the appendix.
Use math if needed. If you're introducing a new loss term, proposing a new algorithm with multiple steps, or designing a new operation, do your best to summarize with an equation. ¹. Use an algorithm block as well if relevant.
Unify convention. Follow the same conventions throughout the entire section, and cross-reference as much as possible. For example, if your algorithm block uses \Pi_i, use that variable in the exact same way in your equations.

For example, the same paper above introduces the motivation for using an oblique decision tree, as opposed to a standard decision tree:

Experiments - Just as every other segment in the paper, every paragraph in the experiments section should make an argument. For example, you may make the following arguments to show a better accuracy-latency tradeoff curve:

Our method achieves higher accuracy due to an increased number of parameters during train time.
Our method achieves lower inference-time latency due to a simpler test-time architecture.
Our method overall achieves better accuracy-latency tradeoffs at test-time.

These arguments then form the backbone of this section: Don't simply run experiments and slap them in. Consider the primary arguments you make in the paper and which ones need empirical evidence. This section contains the "core" results, which show your method outperforms other methods.

For number-focused conferences such as CVPR, make sure to include a table with quantitative comparisons.

Ablations - The ablation section should provide an in-depth analysis of the components of your approach. In this sense, ablations compare your method with itself, examining different variants to validate each component's necessity. By contrast, the experiments section compares your method with other methods.

Conclusion - The conclusion section should provide a summary of the paper's main argument, including the problem statement, insight, and main conclusions. The conclusion should generally move from a specific discussion of your contributions to a more general statement about your contributions' impact.

Timeline for paper writing

Your timeline is designed to accommodate two important facets of the paper writing process:

Accommodate a possibly-changing story. You'll begin by nailing down the biggest arguments, then add more and more detail to guide the more nuanced parts of your story. Said another way, we first write parts of the paper that will not change.
Guide the rest of your experiments. Assuming you're in the middle of running experiments, ironing out the story bit by bit will help clarify what remaining arguments you must make. In turn, that clarifies what experiments you need to make those arguments.

Given this, I usually follow the below ordering for writing different sections in the paper. This assumes you don't have your experiments fully planned yet or close to completion. If you've finished your experimentation, write in any order you feel most comfortable with.

Your top priority when you begin the paper-writing process is to put all of your thoughts down on paper. You don't even need very clear English. Write down an outline of all the arguments you intend to make. Your outline should follow the guidelines we discuss above. Critically, outline the arguments you make in every section:

- Figures
- Abstract
  - include motivation, challenge, insight
  - mention method and key results
  - include 2-3 quantitative results
- Introduction
  - 3 technical challenges
  - 1 insight to address all
  - How your insight addresses all 3 challenges
  - 3 contributions
- Related Works
  - 2-3 clusters of prior works
  - answer: what prev papers miss, how you fix that
  - ~50 citations appropriate
- Methods
  - motivate and describe method
- Experiments
  - 2-3 arguments
  - justify arguments with results
- Ablations
  - compare your method with itself
- Conclusion

Once your outline is completed, begin fleshing out your paper in the following order:

Figures. Make placeholders for your figures, to get a rough sense for size and shape. If your paper format is two-column, get a rough idea of which figures need one or two columns. Again, determine what arguments each figure is going to make and what the key steps for explaining your method are.
Methods. The methods section should not change much at this point, given you're already obtained your minimum viable result. Additionally, this methods is the one that needs the most extra eyes. Remember that the method is crystal clear in your head and your head only. In your first draft, make sure to include the insight that motivated your method. Pass this around your co-authors.
Related works. If you have done a very thorough literature review before the paper push, you can put this off until the end. However, if you haven't yet done a proper literature review, now is the time. Determine what prior works are doing, how to compare with them, and how you need to formulate an argument in response.
Introduction. Write your initial version of the story, and make sure you understand what the core, beneficial differentiator between your paper and prior works is. That differentiator will form the basis of your argument and your paper moving forward. Critically, ask yourself what arguments you need to make. These core arguments will form the three contributions for your paper.
Abstract. Simply summarize your introduction section, and compress it. Go through several rounds of "Compress to abstract" → "Expand to introduction" → "Re-compress to abstract" etc. This will help you identify extraneous sentences that don't contribute to your core argument. Begin to pass this abstract around to your friends and colleagues, to see if the general gist is conveyed properly.
Experiments. As you receive more and more experimental results, begin filling out the experiments section. This can occur concurrently with the previous sections. Even if this section isn't polished, ensure your experiments are grouped under the relevant arguments.
Ablations. This is mostly based on questions that your colleagues ask as they read your paper — or any thoughts you may have. Is component X needed? What if you replace Y with a baseline? Why is Z designed the way that it is? All of these are ablation studies.
Conclusion. This is the absolute dead last paragraph to write. This can happen 5 minutes before the deadline. I don't recommend writing this 5 minutes prior, but it is by far the least important section.

Your most important written sections, clarity-wise, are the abstract and methods section. Your top priority above all else is to ensure that your figures are clear, especially for computer vision conferences.

Note I've excluded actual dates from this timeline. This is mostly because I never follow a set timeline. As a result, instead of listing exact dates, I simply provided an ordering.

What to prioritize in the last moments

Given the proximity to the deadline, all of your work should be geared towards the following two principles:

Do I need this for the paper's main argument? Distinguish between experiments that satisfy an intellectual curiosity from those that are necessary. Pick one core argument to make, and ensure you have fully explored that argument. Auxiliary arguments are great to make if you have time.
Does this help me solidify more of the story? In short, you're now looking to close gaps and complete a story arc. If this question creates more questions than it answers, it's not the right experiment to run right now. In short, more than ever, run experiments you think you know the answer to, to make the arguments you need to.

The closer you get to the deadline, the fewer experiments and the more post-training analysis you should be doing. Here are some general tips for paper writing:

Backup your paper, even and especially if you're collaborating with co-authors on Overleaf. It's easy to override and lose edits, and more importantly, Overleaf may go down before the deadline. Knowing this, backup Overleaf to Github. A little known fact is that every Overleaf project is a git project.
Submit regularly leading up to the deadline. Submit a week before, a day before, and every hour on your last day. You don't know if or when the submission website will crash, so best to ensure you've got the most updated copy at any moment in time.
Don't sacrifice health, unless it's for your last 1-2 days. Continue to eat, sleep, and exercise as usual. This is an extremely important tip: Don't burnout before the deadline. Quite simply said, you can't afford to. Definitely focus on the paper, but if it's looking so dire that you need to pull multiple all-nighters, it may be worth delaying until the next deadline.

Follow the two principles above, and you should be on your way to a cohesive paper. Nerve-wracking as the process may be, remind yourself that you can't regret it if you work your absolute hardest. Let a future, smarter you handle the reviews; for now, do your best to write a clear, well-justified, and impactful paper.

← back to Guide to the PhD

I often lament — and still do — that many deep learning papers contain unnecessary amounts of math. As a result, I make this recommendation to include math, hesitantly. As I've learned reluctantly over the years, math can clarify in select scenarios. There's no need to reintroduce the cross entropy loss if you're designing a new transformer variant. However, if you're modifying the loss function, you could definitely benefit from re-including a cross-entropy term in your final classification loss. ↩
The author Richard gave me this advice as he was writing that paper in fact. Notice that Figure 1 illustrates the problem (lack of shift invariance). Figure 2 illustrates the insight (inserting anti-aliasing operations). Figure 3 illustrates the method in more detail (how to anti alias an operation such as max pool). Figure 4 illustrates how this approach solves the problem, on a toy example. The remaining figures show experimental results and analyze results. ↩