from Guide to the Job Hunt on Jul 16, 2023

How to succeed at AI design interviews

AI design interviews are similar to system design interviews, where you're given a generally broad prompt — only now, the prompt involves a model of some form, such as "Track a football".

Being a design-focused interview, many of the overarching principles in an open-ended AI design interview are similar to system design interviews — in short, you're brainstorming ideas with a future colleague, and your role is to organize, distill, and drive the conversation.

Prepare for the unspoken agenda: Ask, discuss, code, test.

The unspoken agenda is very similar to the coding interview's. Let's walk through the general structure of an AI design interview's response. Knowing the unspoken agenda will make you seem more prepared and your proposal more cogent.

  1. Interviewer poses question. Interview provides a very simple prompt, such as "Track a football".
  2. You ask questions. You clarify the scope and requirements for the problem.
  3. Simple baseline approach. Define the task — what data, model, metrics. The quicker to a prototype, the better.
  4. Iterate on design. Build up in complexity, possibly by addressing interviewer feedback.
  5. Monitor post-deploy. Discuss how to monitor, evaluate, and improve model performance in production.

Let's now break down each of these steps in detail.

Knowing the unspoken agenda will make you seem more prepared and your proposal more cogent.

Step 1. Interviewer poses question. Interviewer poses a simple short prompt

Step 2. You ask questions. Ask a few standard questions to understand the scope of the challenge. For example, predicting in real-time on a mobile device is far different from maximizing throughput for a model in the cloud.

  1. Clarify end usage. Understand what the predictions are being used for, in the end product.

    1. The final usage will determine what outputs your model ultimately needs to provide. For example, "tracking an athlete" could mean segmentation, detection, or keypoint estimation.
    2. Say you're tracking a football as above. If the end goal is to highlight the football in a live sportscast, then your model should segment the football pixel for pixel. Alternatively, the goal may be to crop the live camera feed to focus in on the ball carrier; if that's the case, then your model could simply detect bounding boxes instead.
  2. Clarify available sensors. Understand what your model can accept as input reasonably.

    1. The sensors can drastically change the difficulty of your modeling task. In some cases, you're allowed to setup the sensor system, picking both the sensors themselves (e.g., infrared vs. RGB camera) and their placement.
    2. For example, to track a football, one way to reduce the problem is to place a micro-pressure sensor on each athlete's palm and bicep. Although there will be some false positives for each sensor individually (e.g., the player shoving another player), there are hopefully fewer false positives for both sensors being engaged simultaneously. This isn't realistic given the size of pressure sensors today, but it's one way to (attempt to) use sensors to reduce the problem.
    3. Another example would be to place an RFID tag on each player and an RFID reader in the ball. The violent handling of the ball may damage the reader, but a player in possession of the ball for a period of time could possibly be identified. As silly as these ideas are, shortcuts are useful for both simplifying the problem and for collecting accurate ground truth data alike.
  3. Clarify setting. Understand where and how the predictions are being used.

    1. The setting determines resouce constraints that are imposed on your model. Will you need to predict in real-time? What resource constraints — latency, power, memory, storage, etc. — does your model need to meet? Do you have access to server-grade GPUs or just on-device compute?
    2. Say you're still in the previous example and have now decided to crop a live camera feed. If you're working for NBC and they're looking to broadcast the game, you can host a segmentation model in the cloud on a beefy server. However, if you're working for a startup that's got a hot new mobile app to process the user's camera feed, you likely want to run a local model for real-time updates to the camera preview.

Step 3. Simple baseline approach.****Design a minimal baseline that can be understood and tested quickly. Your interviewer may egg you onto a more thorough solution, but your goal is to touch on as many concerns as possible, to see which your interviewer wants to hear more about.

We discussed some radical ways of simplifying the problem above. Here, you're not simplifying the problem but are instead setting up the infrastructure. Touch on all aspects of the system you'll need to setup.

  1. Data. Discuss how you'll collect data. It could be web-crawled and human annotated as a start.

    1. Are there ways to obtain annotated data for "free"? In the ideal case, there's a pre-curated dataset available, or a related dataset you can cleverly repurpose. Pseudo-labels using a pre-trained model are certainly possible, but these are noisy. Using additional sensors for ground truth as we discussed above is another way. Or, there may be unannotated data you can pretrain on.
    2. If "free" ground truth is not available, are there ways to make annotations more cost-efficient? For example, label every 10 frames in a video and use simple kinematics models to propagate labels across frames. Alternatively, ask annotators to refine pseudo-labels instead of annotating from scratch.
  2. Model. Discuss different possibilities for the model and associated architecture. Start simple with a known and popular architecture.

    1. Is there a pretrained model for a related problem you can use? For example, an action detection model for football player, or a volleyball tracker. Discuss how you would fine tune this model to detect footballs instead. For example, the action detection model would need localization information added, possibly with CoordConv.
    2. Is there an architecture that is tailored to this problem? Identify a property of the data or task unique to this task and how the architecture is adjusted to handle that. For example, football players are often all rushing towards the ball. Our model thus may need to leverage context up to a certain distance to help determine where the ball is.
  3. Loss. Describe the loss function you would use to train the model on the provided data.

    1. Classification would be straightforward; use cross entropy. However, you have an array of different options available to you if you're first pretraining on unlabeled data. Additionally, you may add different regularizers to training, based on the problem.
    2. For example, the football is often and easily occluded by players, either partially or even fully. To handle this, we may randomly mask and infill parts of the ball during training to imitate occlusion, to make the model more robust to real-world occlusions. Here, you should also note that the infill may be unrealistic, instead giving the model extra signal about where the ball is — this would be an unrealistic signal that training data provides, that is not present in test data.
  4. Metric. Describe how your model would be evaluated.

    1. This could be interesting even for classification. As a start, you can use accuracy as a simple metric. However, you may be more interested in false positives than false negatives. Or, you may be interested in a classifier with a tunable false-positive rate, meaning you now need an ROC curve.
    2. For example, there's only one football on the field at any given moment. One metric would be how often the model predicts more than one football on the field; this is clearly wrong and an example of a catastrophic failure.

At this point, your interviewer may already be poking and prodding at your proposal, so your interview is most likely going to naturally segue into the next step.

Step 4. Iterate on design. Any part of your above proposal can be iterated on, either to improve quality or to better meet project requirements.

Step 5. Monitor post-deploy. After deploying the model, discuss how to monitor and evaluate quality of the deployed model. The goal of your monitoring should be to uncover cases your model performs poorly on.

In summary, prepare for the above stages in your design interview. You'll be much better equipped in your own interviews by just knowing the sequence of steps.

Your Rubric: Background, Research, Production

There are a variety of ways interviewers can throw curveballs at you in the design interview. This is done to assess three categories of knowledge and skills. Different teams and companies may use this interview differently, but you can expect the following rubric items in some form.

Apply background knowledge: You should know your basics in back-propagation and transformers, sure. You should also know when to apply what. Although testing your knowledge isn't the point of the interview, testing your application of existing knowledge is. For a primer on transformers, see Language Intuition for Transformers.

You should be familiar with why different methods are being applied — don't just memorize keywords.

Conduct research efficiently: You should understand how to break down a complex problem into small, answerable hypotheses. Those hypotheses should then be answered as directly and quickly as possible. For more details, see What defines a "good" researcher?

Aware of production needs: You should be aware of real-world deployment concerns for a model — how practical it is to obtain a certain kind of data, how finicky or stable a model is to train, and possible concerns with a model's performance. In short, be able to anticipate and plan for obstacles.

The above rubric isn't exhaustive, but this is the general gist of the evaluation.

Practice, practice, practice.

Throughout this post, we introduce a large number of examples and tips. The top tip, however, is none of those. Instead, it's to practice. AI design is so unlike any other interview and job that any amount of related practice is better than none.

  1. Practice thinking aloud. As we discussed in How to succeed at coding interviews., practice talking and thinking at the same time. This is a lot harder than it sounds, but it's simultaneously very critical. The entire interview is one large brainstorming session.
  2. Practice organizing brainstorming. The difficulty is in (1) producing new ideas, in the brainstorming process and (2) simultaneously keeping that suggestion in context, for how it addresses one of your requirements. However, this is an important skill. You can practice this by incessantly re-summarizing takeaways, throughout the brainstorming session.
  3. Practice finding shortcuts. Continuously find ways to simplify the problem. You'll need shortcuts to obtain cheap ground truth data, solve the problem in a more elegant way, or to improve the model's performance. Many times, these shortcuts may be rejected by your interviewer; perhaps they're looking to discuss a specific topic, such as inference optimization. However, a clever shortcut can work wonders for the elegance of your solution.
  4. Practice discussing tradeoffs. With any design choice you make, discuss the tradeoffs for that choice. Your new grouped convolution may for example reduce the latency but come at the cost of quality. You may also expand the sensor suite to collect more information, but this comes with a higher risk of sensor mis-calibration, in your collected data.

Here are AI design prompts that I encountered in actual interviews:

Now, you know what to practice and how to practice. Grab a colleague or a friend, and practice brainstorming together. Even if you can't find a friend to practice with, practice the tips above on your own.


back to Guide to the Job Hunt



  1. There is a correct answer. You may have noticed: The softmax is completely unnecessary during inference. Softmax is a monotonically increasing function, meaning that the relative ordering of inputs is unchanged. This means the argmax of the softmax'ed inputs is the same as the argmax of the inputs. The solution here is to simply drop softmax. With that said, any clever approximations of softmax are a reasonable but not the ideal answer. 

  2. As we discussed in the linked posts, Flash Attention reduces latency by jointly tiling two matrix multiplies. Ultimately, this allows us to skip reads and writes for the intermediate output of the first matrix multiplication. At a high level, the number of reads-writes we save for that intermediate output must exceed the number of additional reads-writes we incur for our weights. We fully derived an explicit expression for this tradeoff in When to tile two matrix multiplies

  3. I'm actually not making this up. Softmax is known to be inefficient, as Grave et al discuss in their paper "Efficient softmax approximation for GPUs". 

  4. This notion of adapters is taken from Houlsby et al in "Parameter-Efficient Transfer Learning for NLP". In short, insert an "adapter" — few fully-connected layers — after the attention but before the MLP in a transformer. Freeze the rest of the model, and fine-tune just these adapters.