October 23, 2022
What defines a "good" researcher?
What makes the "ideal researcher"? No one knows, and what's worse: Without knowing your goal, you won't know what to work on. 5 years ago, this confusion stagnated my research career; let me save you from that.
At the start of my research career, I had no idea what kind of qualities a top-talent researcher possessed, much less how to get there. This did manifest itself in important ways: I was unable to pivot from dysfunctional ideas, results took longer or were never obtained, I got easily overwhelmed by lack of progress.. the list goes on. It's honestly a crappy place to be, and the best way to summarize it is: I sucked at research. Funnily enough, as rock-bottom as any given day felt, the next day was even worse: Now, there were n+1 days without research "progress" — whatever "progress" meant.
Today, I can confidently say that I suck less at research, by simply knowing what kind of researcher I need to be. To do this, I amassed a list of ideal-researcher qualities by observing ideal researchers — the many mentors spanning industry and academia, Bichen, Richard, Lane, my former adviser Joey, and many many more. These are researchers that produce results weekly if not daily, but despite their superhuman productivity, they sleep 8 hours a day just like the rest of us mortals.
Note: I've been told that these guidelines can apply to other "fields" too, such as entrepreneurship. However, my field of expertise is in research, so I'll stick to examples and details in research. These tips may very well generalize, so feel free to apply liberally.
To save you from the years of confusion, I'll cover 3 qualities of top researchers, so you know what qualities you need to start building. These are my own opinions, and opinions will differ from researcher to researcher. However, just knowing these criteria has given me significant peace of mind; when a paper goes well (or not), I can point to these qualities and identify what went wrong, or what went right. The next paper deadline is then a chance to try again. These are guidelines I wish I had when I started my research career, and I hope they'll serve you well, either as guidelines or as ideas for your own.
What are the most important questions in your field? Of those, which are easy to tackle? And why aren't you tackling those problems? 90% of research is finding the right problem to work on. You could work hard and fast but in the wrong direction: This leads to burnout and disappointment. Take the extra time to really hone in on the right problem. Note this focus on the problem is an oft-repeated saying and not wisdom I'm sharing for the first time. Despite that, this isn't iterated enough.
Here's an easy litmus test: When you introduce what you're working, do you start with the problem, or the solution? If the latter, really rethink whether your solution is guided by a well-defined problem. The key to impactful research is the problem itself, and often times, a paper introducing a new problem can become wildly impactful despite a mediocre method. When outlining a research agenda, there are 3 tips to keep in mind for your problem statement:
Know the problem. The obvious but necessary tip is to know what the problem statement even is. Can you summarize it to a peer in 30 seconds or less? Ensure the problem statement is crisp in your head and possible to communicate. If you can't communicate it, it's likely not clear enough.
Pick an important problem. The problem should have clear applicability. Note that problem statements early in your research career don't have to be widely applicable. However, the application should be clear nonetheless. For example, say you're tackling a specific failure of image classifiers. Explain why this particular error is worthwhile to solve: perhaps it's directly related to a particular misclassification of cancerous tumors.
Pick an "easy" problem. Pick a problem that you believe will be easy to tackle, or at least, a problem you very confidently believe can be tackled. The best problems to work on — or specifically, the best solutions to work on — are ones that you believe in 100%. Then, either it works and your assumptions are validated, or it fails, and you learn something new. Note this notion of "easy" is specifically from your standpoint, so the problem itself may be difficult. However, with your particular skillset or insight, the problem suddenly becomes easy.
Taking altogether, these 3 tips culminate in a pretty obvious tip: Work on easy but important problems. This means picking significant, low-hanging fruits. However, note that missing one or the other is a mistake: Tackling insignificant, low-hanging fruits gets you nowhere. Likewise, tackling an important but unsolvable task will likewise stump you. Yet, there are plenty of examples of papers that miss the mark:
Missing importance: There are a slew of papers that operate on colored MNIST datasets. Or on 10x10 CIFAR-10 images. It's not clear why these are important problems to solve. MNIST and CIFAR-10 themselves are already well-distanced from reasonably-sized, natural images. Further modifications push those datasets into further obscurity and impracticality.
Missing difficulty: On the other hand, there are also papers that cobble together a random collection of techniques without rhyme or reason. It's not clear what the problem is or if the random collection solved it. Worse yet, some papers "solve" already-solved problems, which prior works address. It's not clear why these problems are difficult and as a result, why they're worth tackling.
Work on easy but important problems. This means picking significant, low-hanging fruits.
When you're just starting and brainstorming for a new project, consider this process:
Start naive. Ask yourself: "What is easy and important?" You can avoid existing works to start. Pick a problem domain that interests you. If you're later in your PhD, you may already know which general domain interests you: Pick a problem within that domain that you believe solves one of the most important problems in that domain. This initial ignorance is important to have. It lets you brainstorm constraint-free. Ask crazy questions, and put down nutty ideas. What's funny is, Joey would often do this in our 1-on-1 meetings. Every single time, I would (a) shoot down his idea in the meeting and then (b) realize a week later that his story was a very interesting take on the problem. If you're in a PhD program, your adviser can hopefully do the same — provide you with the right stories and perspectives. The same goes for your mentor, senior engineers, senior researchers — anyone with a 30,000-foot perspective.
Perform a thorough literature review. Present your ideas. After your blissful state of ignorance, return to reality and start looking up key terms. See what related works there are in object detection, depth estimation, whatever your topic is. Note that it's not usually possible for a problem statement to be scooped; more often, it's methods that can be scooped. As a result, even if you find highly-relevant papers, don't despair. If anything, (a) it means that your knack for problems is validated by other researchers and (b) you now have papers to pick at, instead of starting from ground zero. At the same time, present your problem statement to other researchers. Others may have ideas or have heard about related works. This is generally how research collaborations start — when multiple parties share an interest in the same problems.
With these two phases done, you should then move on to failing fast. Surprisingly, there exists a way to constantly make research progress, in the face of uncertainty.
Here's a hypothetical but common scenario: We have a hunch some portion of previous methods is "wrong" — the loss, the architecture, etc. In light of this, we formulate a solution, code it up, and try it. The first iteration doesn't work, so we brainstorm 3 ideas and try them all. Unfortunately, after trying all 3 ideas, you still have not improved over the baseline. For now, you're still optimistic, so you talk to more people and solicit 5 more ideas. You again try them all, again to no avail. This continues, with you trying new ideas and soliciting new ideas. Eventually, you'll run out of ideas, steam, and compute credits. We're all familiar with this — by the end, you're burnt out and results-less.
The root cause? You never tested your hunch, the core hypothesis that previous methods are "wrong" in a particular way. In many cases, it turns out this hunch was wrong, and the problem never existed to begin with. Then, in retrospect, you often realize "Of course no solution for this nonexistent problem would help."
In retrospect, you often realize "Of course no solution for this nonexistent problem would help."
To address this, we need to fail fast, thus the title for this section. This means optimizing for disproving assumptions, to help us narrow in on the version of our hypothesis that's correct. This process is like traversing a large binary tree, where the faster we can traverse this tree, the more quickly we can make research progress. To traverse this binary tree, we need 3 steps:
You need to know what each fork in the road is, what binary decision you're trying to make. Always ask yourself: "What is my hypothesis *actually?"* You must know what question you're really asking, and this usually means digging into your thoughts to identify the assumption you're making, that needs testing. The hypothesis should not be "Method X will improve accuracy". That's not a(n interesting) hypothesis. If this is your hypothesis, ask yourself: Why should method X improve accuracy? These hypotheses should be ones you believe in firmly. It should be obvious that they're true, to you at least.
One natural knee jerk response is: Most questions are open ended. Why is this tree binary? The reality is, most questions should be reduced to binary, for it to be more clearly and definitively answerable. In this case, your binary decision should be "Is my hypothesis true?". The most challenging step is the previous step, recognizing what your hypothesis actually is.
Third, fail the hypothesis quickly. This is the reason for this section's title: Fail fast. This is what my adviser likes to say, and this particular tip has been a big boon for my research productivity. Rather than trying to prove the idea, it's often worthwhile to run several quick sanity checks that aim to fail the hypothesis. If you can fail fast, you can eliminate branches of the tree quickly, and move forward.
Let's say we hypothesize "Adding cutmix regularization will improve accuracy". As we said before, this is an uninteresting hypothesis, mostly because failure doesn't teach us anything.
Context: For some background, cutmix is a regularization technique that involves mixing classes. Say we have a cat and a dog picture. During training, we will crop a cat image and paste it onto a dog image. Our final image is 60% cat and 40% dog. Then, our image class label is updated to be 60% cat and 40% dog. Throughout training, we will change the percentage given to each class — 70%-30%, 80%-20%, etc. Intuitively, this forces our model is better understand when a cat-dog hybrid is more cat-like or more dog-like. Reference: CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Here is thought process you should follow, to improve this hypothesis.
Revised hypothesis: Digging down one layer, we may ask "Why do we believe cutmix will improve accuracy?". Cutmix works by blending images of different classes together, so why should that help our model performance? We can then revise our hypothesis to be: "Cutmix in particular helps because our model is confused by hybrid classes, like cat-dogs".
Sanity check: This yields a very natural fail-fast experiment. Check all samples where the model can't decide between 2 or more classes. (1) If a significant portion of confusing samples are misclassified, cutmix would be an appropriate method to more properly teach the model when a cat-dog is more cat-like vs. more dog-like. (2) If most of these confusing samples are classified correctly, then cutmix is not likely to help. One extra note: If you try anyways and cutmix still helps, then this opens a brand new set of questions: Why does cutmix still help when the problem it was designed to solve does not exist? Cutmix must work differently from the way we expected.
Note that it'd be ideal if we could identify samples where the ground truth label is a mixture of two classes. However, in most image classification datasets, these mixed labels are unreasonable to obtain at scale, without significant human annotation effort. As a result, we used the next-best indicator available to us, above: Samples where the model's predictions showed confusion.
Here's why this process is important: If we test cutmix blindly, we don't know if the failure was due to a bug, a misconception, or if our hypothesis about confusing samples is true. However, by running this sanity check: we now have a surefire reason to believe cutmix should work. If cutmix fails, we can more confidently say that there was a bug in our implementation.
Digging down even further, we may also find an even quicker fail-fast experiment. Rather than ask about cutmix in particular, we can question the need for regularization.
Digging down even further, however, we may ask "Why do we believe regularization in general will improve accuracy?". We can then revise our hypothesis to be "Regularization generally should help because our model is heavily overfitting."
Sanity check: This one is even easier to sanity check. Look at your training and validation accuracies to check for an outsized gap in accuracies. The gap size is of course subjective, but a difference of <1% means you can throw out regularization as an easy win. If the difference is 10%+, regularization will likely net you gains in validation accuracy.
Note you may also choose to run both sets of sanity checks in this case. It's helpful to design one fail-fast experiment that fails multiple hypotheses at once. However, in this case, cutmix may improve performance on images with class confusion, regardless of overfitting or not.
In summary, your goal is to fail fast. To do this, make sure to step back every so often and ask: What is my actual hypothesis? What is the fastest way to test that hypothesis? Conducting research involves traversing a binary tree of decisions, and you should eliminate wrong paths as quickly as possibly, by designing the right sanity checks.
On a very broad level, a significant portion of research is also about selling your work. At the most technical level, it means your paper must tell a convincing story: illustrate a massive, insurmountable problem; provide the insight you had that no one else did; then show an elegant, simple, easy-to-adopt method that capitalizes on the insight.
Note: At a level most researchers don't like to admit, this also means you should be able to market your work. However, before thinking about marketing your work, keep in mind marketing acts only as a multiplier: A poorly-written paper with excellent marketing is
0x10=0, and likewise, a well-written paper with poor marketing is
10x0=0. The mixture of both is what makes successful papers and research. We'll focus on just telling the story in your paper, in this post.
There are 2 components to a good story. Any of these components, or maybe all of them, is what others refer to, when they say a "good story":
Illustrate a massive, insurmountable problem: Illustrate a problem that is exceptionally difficult to solve. As my undergraduate adviser Kurt liked to say: "Illustrate the big bad dragon before you slay it". This is the most important part of the story.
This appears to contradict our previous tip: We elaborated on the criteria for a real problem, in "Find real problems," where I advised picking an "easy" and important problem. "Easy" is from your perspective, but when sharing your story, it's the reverse: Convince your reader that the problem is exceptionally difficult. This doesn't mean falsifying information. It means highlighting all of the technical hurdles you had to face, in your pursuit of a solution. It also means possibly highlighting all previous attempts from other works to tackle this problem. Even better, it may mean highlighting a dearth of works that tackle this problem. In the end, when telling your story, highlight its difficulty and importance. We also discussed more examples of telling the problem statement in "How to write your personal statement, for PhD admissions".
One of my favorite ways of illustrating a problem is by illustrating a dichotomy. For example, in a 2021 ICLR paper titled "Neural Backed Decision Trees", I wrote that previous methods either "(1) sacrifice interpretability to maintain accuracy or (2) sacrifice accuracy to maintain interpretability". Our method then improved accuracy and interpretability. I did the same in a 2022 CVPR paper titled "FBNetV2": Either (1) benefit from a large search space but with significant computational cost or (2) restrict your search space but save on computational cost. We again broke the dichotomy by increasing search space size by many orders of magnitude but keeping computational cost constant.
Share the insight everyone else missed: Identify the one key idea that all previous works missed. Perhaps all previous papers missed a part of the problem — a difficult but common scenario that occurs in the dataset. Or, perhaps all previous works missed a part of the method — a small but effective tweak that significantly improves performance. Your insight should allow you to introduce your method like this: "Using this insight, we …". This insight is sometimes omitted even from great papers, which substitute insight with a slew of very impressive results. This is the exception, not the rule, for great papers. The results should be so convincing that the research community is willing to take the effort to draw their own insights. In telling your story, including the insight increases the odds that the method and results stick.
The components above are the most critical. Here are some other tips and considerations when crafting your story. These are less critical but can certainly help with an effective story:
Employ an elegant method: Elegant methods are those that take only a small tweak to use. This could be a simple change to the architecture, an extra loss term, or an extra dataset that everyone includes during training. Elegance makes adoption easier and increases the odds of your work gaining traction. "Elegance" as we've defined it here isn't necessary per se, and complicated methods are okay, as long as they're accompanied by a clear insight.
Practice, practice, practice: To convey your story effectively, tell your story often and widely — adapting and updating your story as needed.
Even and especially if your work is not fully fleshed out, pitch it at lab meetings, tell it to other graduate students, and sell it to your adviser. This is often overlooked: The value of sitting in lab and lab meetings are for you to refine this storytelling skill. Lab meetings aren't supposed to be for fully polished works. They are designed for peer feedback and for developing your story. So, tell your story often. Chat with others about what you're working on, and they'll give feedback, suggest ideas, and overall help you to enhance your story. If other graduate students approach you, do the same in return: Ask what they're working on, and poke and prod as need be.
Get over the risk of being "scooped", as working out your story is far more important. Use discretion for your risk of being scooped of course, but as a budding researcher, honing your storytelling abilities is generally more important than the idea itself.
In summary, your goal is to share a compelling story: share how important and difficult the problem is, the insight that enabled you to solve it easily, and an elegant method that's easy to adopt. Then, practice this everywhere and tell it to everyone — even and especially if it's not well-formed.
You'll need both importance and difficulty to make for a convincing problem that draws widespread interest. Note this is true even if you're working in a well-established problem space, like object detection for self-driving for example.
Problem: In this case, your goal is to carve out a problem subspace and argue why previous methods were inadequate. For example, say your paper tackles detection for partially-occluded objects. You can start by illustrating how difficult of a problem detecting partially-occluded or occluded objects is. The challenge in itself is obvious: partially-occluded objects are not as recognizable as un-occluded objects. That's the difficulty. For importance, you can argue that most important objects are partially-occluded — a pedestrian moving out from behind a parked car, a car poking out of the T-junction.
Insight: At a certain point, however, you need to understand where this work sits relative to other work. This is where understanding related work comes in. Illustrate why prior anti-occlusion methods all failed. Perhaps they all missed a particular, difficult but common scenario. Why is it that previous methods are inadequate? For example, we may argue that square receptive fields for all convolutions are not well-suited for the irregular shapes of partially-occluded objects. Tailoring receptive field shapes may improve computational efficiency and performance. This is the insight: that vanilla convolutional filters are not well-designed for the shapes of partially-occluded objects.
Method: The method in this case would be to simply employ irregularly-shaped convolutions. In theory, this is just a single-line change for every convolution in your codebase, as PyTorch (and probably Tensorflow) support irregular filters out-of-the-box as of today.
With these 3 pieces of the story thought out, your next mission is to pitch it to all sorts of people. Through pitching it to others, you should hopefully encounter resistance and some interesting questions. Here are a few examples of questions others may raise:
Previous works explore irregularly-shaped filters already, either in object detection or in other domains. How is your approach different? There are a number of papers for the search term "Irregular convolution" but only a handful of marginally related ones are well-cited. There is likely a good reason why previous methods are not widely-adopted today. Note that "how is your method different?" should never be answered with trivial, implementation details. Using a different learning rate or different connectivity is likely uninteresting, except in rare cases. This question is really asking "how is your method different, in a way that affects accuracy?". For example, maybe previous methods tried only extremely irregular filter sizes, like
(11, 3), when in fact smaller aspect ratios like
(5, 3)were most effective.
Hardware is optimized for square-shaped filters, so would your irregularly-shaped filters improve accuracy at the cost of latency, when deployed to the self-driving car? Keep in mind, this should not be a distraction. This concern is valid to address, but before you optimize irregularly-shaped filters for hardware, you should check that irregularly-shaped filters actually improve detection of irregularly-shaped objects.
With the qualities outlined above, you now know what your target is, as a researcher. However, the next natural question is: How do I improve these research qualities in myself? The tips above lend themselves to daily practice, which you can start employing immediately.
For finding real problems, read paper abstracts left and right. See how others framed the problem. If it's interesting, dig in and read the introduction and related works. Read the top-cited papers in your field. If you're not sure which those are, follow paper Twitter accounts like @AK, browse conference proceedings like ECCV 2022 (for computer vision), or talk to your mentors. Reading is the name of the game.
For failing fast, for any paper or work you come across, try to distill problems to the core hypothesis, then ask yourself what the fastest way to test that hypothesis is. This is harder than it looks, and doing this well is the key to not only your own research success but also your success as a mentor. Every one of the mentors I mentioned in the introduction has guided me in this way, and I do my best to pass this on when I can. This is relevant not just for your current work but also for the papers you read. Perhaps their wins were smaller than expected, or perhaps the method was unnecessarily complex. How would you design an experiment to simplify the method, retaining the key parts and dropping insignificant ones?
For telling stories, practice telling stories to everyone you can. It can be for your current research, or it could be for a random idea you had. It could be for a field you've always been interested in, or a topic you recently read about. Note this isn't about selling products to all your friends. It's more so reviving the secret of small talk and deep conversations. A conversation is most interesting when you find both of you are deeply passionate about the same problem space. Share what you can, and learn from them what you can.
If you find yourself enjoying the process of honing these research qualities, then a PhD could be up your alley. Consider reasons to pursue a PhD (and reasons not to) in "Why pursue a PhD? Is it for me?".←