from Guide to the PhD on Aug 13, 2023

How to read papers

I remember being asked to conduct a "literature review" for the first time many years ago. I had so many questions: How do I read papers? What am I looking for? Why is this paper "Best Paper"? Heck, how do I find papers? Here's the guide I wish I had.

A large part of conducting research is understanding what state-of-the-art even is, to make contributions to the field. The matter is further complicated by highly-opinionated researchers all with different ideas of what state-of-the-art is. Fortunately, especially with recent hype in AI, there are more and more accessible in-roads to keeping up with research developments.

This is the guide I wish I had, for reading papers. Rather than write for my past self, I'm writing for a modern version of myself, if I had started research today. Fact is, news, tools and accessibility of the field broadly has drastically improved. We'll go over how to maximize this, in this post.

How to find papers

Generally, paper searches shouldn't be done alone. Rather than seek out the most relevant papers in a field, you should actually seek out relevant experts or curated lists.

Find an expert colleague. As a PhD student or a researcher in a large industry lab, this is more straightforward: There's bound to be some colleague with expertise in that field, somewhere in your organization. There are two ways you can achieve this: through word-of-mouth, or through a list of publications from your organization.
1. Ask your manager or adviser. Back in 2020, I needed to learn more about explainable AI broadly. This was fortunately simple to do: My adviser recommended I consult lab-mate Lisa, who had been working on explainability methods for a year. Right after, we chatted in lab, and Lisa then suggested I speak with Prof Bargal, who had built a career in explainable methods. This outreach had the side benefit of a collaboration with both experts, culminating in an ICLR 2021 publication.
2. Browse publications from colleagues. Now at Apple, I browse publications that are specifically authored by colleagues at the company, via forums such as machinelearning.apple.com. In short, I'll treat publications as proxies for expertise, then contact authors for the relevant field. Fortunately, given the size of the company, there are plenty of experts to consult.
Find sources of expertise. Rather than find papers directly, look for troves of papers that are peer-reviewed or otherwise filtered by a curator of some kind. This can come in multiple forms, either in traditional academic forums or even via social media.
1. Known peer-reviewed venues. If you have no papers to begin with, start from a known venue or publication. For machine learning broadly, see ICML, ICLR, NeurIPS. Or, lookup conferences by field, such as computer vision (CVPR, ICCV, ECCV), graphics (SIGGRAPH), NLP (EMNLP), robotics (ICRA, IROS, CoRL) or systems (EuroSys, MLSys).
2. Popular social media reviewers. Several folks on social media are known for consistently promoting good work. There is certainly noise in entrusting curation to a single account, but any curation is helpful when dealing with a firehose of information. There are known industry labs (@GoogleAI, @DeepMind, @OpenAI, @MetaAI), university labs (@StanfordAILab, @berkeley_ai, @MIT_CSAIL, @CMU_Robotics), and known curators (@AK).
3. Use awesome lists. "Awesome" has become codename for "comprehensive list on a topic", on Github. Not to mention, there's an awesome list of all the other awesome lists; here's the meta-awesome list on Github (link). You can also simply Google Github awesome <topic> to find lists on almost any topic. For example, I used awesome-pruning while doing a literature review for my recent ICML 2023 paper.

At this point, you should have a set of candidate papers to consider, either from experts or other sources across the web.

Why read papers

There are a variety of reasons to read a paper, but there are three broad "personas" you can assume. Each one requires a different type of reading or skimming.

Read as a critic, to assess the paper's validity. The primary goal is two-fold: (1) Find the core claim in the paper, and (2) check that the core claim is supported by the experimental results. This is the core function of a reviewer, and for publications that are not yet peer-reviewed, this is your role as a reader.
Read as a user, to build off of it. The primary goal here is to understand the key bits of the method and results, just enough to use the work effectively. For example, in my own research, I accelerate model inference speed. To do so, I'll read and reference "fundamental" papers like Attention is all you need repeatedly — not to pick at the paper, but to find details in the architecture that are hardware-unfriendly and need some post-hoc optimization.
Read as a competitor, to compare against it. The primary goal here is to position the paper relative to other work in the area, meaning the motivation in the abstract, introduction, and related works are key. Often times, your goal is also to position the paper relative to your own work. This is a part of assessing the paper as well — to understand how its contribution adds to the pool of knowledge in a field.

For the most part, reading as a user or a competitor is straightforward: For the former, you already know which parts you're looking for. The goal is to generally understand the method. For the latter, your goal is to find a weakness to exploit — in the motivation, the method, or the results. In light of that, we'll focus below on how to read papers as a critic.

How to read as a critic

As a critic, your main function is assess the paper's main claim; there are four components to this task. You can treat this as four steps in an assessment. If the paper does not pass any stage, there's no reason to continue onto the next stage of assessment.

Function #1. What is the main claim? At this stage, your goal is to simply identify and regurgitate the main claim — not to debate its worthiness or assess its validity. Here are several pointers for identifying claims.

Contributions: Find the bullet point list of contributions at the end of the introduction. Namely, they state what the paper's claims are, in broad daylight. If the paper doesn't clearly communicate what the main claim is, there's little hope that evidence for the claim could be made clear either. If no such bullet points exist, re-read the abstract and possibly the introduction.
One main idea: There should be a clear, central idea to the paper. There may be an assembly of different techniques that ultimately contribute to the model's final performance, but there should exist 1-2 core ideas and takeaway. This is the case in several papers, such as MobileNet, which presented depthwise-separable convolutions but also used a variety of different training tricks to further improve performance¹.
Use the review process: As a reviewer, you have the benefit of asking the authors for clarifications. Most of the questions I have as a reviewer are centered around clarifying the claims, to better understand the contribution to the field. I'll try my best to suggest a possible claim, given what was presented. As a non-reviewer though, it's quite easy to simply move on to the next paper.

Function #2. Is the claim valid? There are several requirements for a claim to be valid.

Uniqueness: The contribution shouldn't be identical to another paper's, especially not to well-cited papers at known venues.
- Some claims overlap strangely with others, being too broad or vague. This may result in inappropriate baselines and missed related work. For example, a paper that quantizes neural networks for medical imaging should consider quantization for computer vision neural networks more broadly — at least as baselines. Even if the paper reinvents the wheel, there should be justification for that. One possible reason is that "conventional" quantization networks catastrophically fail on highly noisy medical imagery, which requires new, noise-robust quantization techniques.
- Other claims are too oddly specific, resulting in misleading statistics or selling points. For example, a 2018 paper submission weightless claims up to 496x compression. However, there's a glaring omission in the abstract, which the reviewers note: This number applies only to two layers of a specific, outdated neural network. This compression ratio is severely reduced when looking at the entire network or more modern networks, making this claim largely useless.
Significance: The contribution should be significant for a downstream metric — improving accuracy, interpretability, inference speed, training speed etc.
- For example, changing a multiplication to a plus is uninteresting, especially if the change is unmotivated and leads to insignificant empirical results e.g., a 0.1% improvement in ImageNet accuracy. Many papers follow this trend, of an insignificant change with mediocre empirical results.
- However, there are simple changes worth noting: For example, changing an element-wise multiplication in LLaMA's MLP to an element-wise addition could be interesting; this allows you to pre-multiply two $D \times D$ matrices, saving approximately $\frac{1}{12} = 8.3\%$ of parameters during inference and for storage, for "free". The subtle difference is not in the change itself: Both the bad and good examples change one operation, but one has a downstream impact (reduced storage size and faster inference speed) whereas the other does not.

Function #3. Is the claim intuitively true? Look for an intuition for why the claim is intuitively true; even a hypothesis works. As long as the paper doesn't appear to throw at the wall and see what sticks. In this case, there isn't much insight that can be carried forward: the paper reads like a laundry list of random techniques and results. See Is my project paper-ready? for a more detailed description of the insight.

Obvious: If an idea sounds obvious to you as a critic, this obvious idea might have existed in a paper you read before. In which case, find that other paper. However, there's the off chance that this obvious idea isn't already explored. Sometimes, an idea is obvious in retrospect, given the author's clever framing of the problem.
Correctness (or lack thereof): This may seem strange, but the insight doesn't need to be correct per se. Here's an example: The original paper that proposed batch norms suggested "internal covariate shift" as the reason for their effectiveness — i.e., that distributions of activations were constantly changing. This exact hypothesis was never tested, but it was interesting nonetheless. 3 years later, a separate paper suggested that batch norm instead works by smoothing the loss surface. These aren't contradictory ideas necessarily, but the original hypothesis wasn't the full picture, and that was perfectly a-okay.

Function #4. Is the claim supported? Once you've found the intuition, your goal is to then see if the experimental results support the main claim. There are two aspects to assessing experimental results:

Significance: Are the presented numbers significant? This is largely a function of existing work in the area. For example, 1 percentage point on ImageNet is considered significant, due to the sheer size of the dataset. Funnily enough, 1 percentage point for MNIST or CIFAR is also reasonably significant, given neural networks should have overfit drastically to those datasets long ago. In a similar vein, a change of 1 PSNR for neural radiance fields is also considered significant.
Relation: Are the numbers actually related to the main claim? This second property is (too) often not assessed, and many reviewers fixate on whatever numbers are presented. The main goal here is to ensure that the baseline and metrics are appropriate, to assess the specific claim. For example, take our batch norm example from above. The central claim is that batch norm improves convergence rate: Correspondingly, the papers main figures and tables all reference convergence speed in some form or another.

Notice a curious property of the above four components. They mirror the typical first four sections of a paper: Introduction for stating the claim, related works for validating and situating the claim, method for explaining the claim's intuition, and experimental results for supporting the claim. This particular framing helps you understand what to get from each section of the paper. I talk more about the purpose of each section in How to write a paper.

There are certainly other aspects to assessing a paper but these other rubric items are all secondary to the above.

Is the paper clearly-written? The paper should be clear to read. No need for perfect grammar, but imperfect grammar should not interfere with understanding.
Are figures clear? The figures should clearly present the core ideas of the paper; reading the figures alone should be enough to understand the main gist of the paper.

With that, you should now be equipped to read as a critic. Use this as a reviewer for a venue, a reader for a peer's submission, or even for your own paper.

Conclusion

As a researcher, staying on top of the field is a must. Use the tips above to keep afloat, but also recognize that you don't need to always be on top of day-to-day Twitter threads. The goal is to keep tabs on state-of-the-art just enough to situate your own contributions to the field.

This ties into my final tip: Conduct periodic deep dives into related work when it comes time to write, publish or publicize your work. Concentrate your reading into specific focus periods or schedule a 5-minute reading period each morning. Don't waste hours each day just skimming; there's much more content on arXiv than you could possibly catchup on.

← back to Guide to the PhD

Note this practice of presenting a central idea and hiding the details is not always a good one. Several Google researchers tend to take this to the extreme. For example, EfficientNet and its subsequent papers repeated this practice to an egregious degree, leading to very poor reproducibility of the paper. With that said, for a number of years, EfficientNet exploded in popularity simply because other researchers used their pre-trained checkpoints instead of reproducing them. ↩