15 min read

How to Prepare for a Machine Learning Job Interview

How to Prepare for a Machine Learning Job Interview

During the last 12 months I did a lot of job interviews for Volkswagen Commercial Vehicles because they are experiencing a hiring spree for IT talent. I have also been recruiting many roles for our new data team, which I really love. So I thought this would be a good time to write a blog post and give you some tips for when you’re preparing for a machine learning job interview. An interview for a machine learning job is one of the toughest to get through – especially when it comes to machine learning-related questions. With this blog post, however, you will be well prepared.


You worked long and hard on your resume, tweaking it just so and earning an interview with a great data company. Excellent! But your work is not over. In fact, it has just begun. You need to organize and present yourself well in order to be among the strongest candidates for a machine learning position that will receive many applicants. It’s not enough to convince the interviewer that you have the skills and knowledge to perform well on the job. You need to show motivation and enthusiasm, and convince the interviewer that you fit the company’s culture in order to get that coveted job offer. Today, values are almost more important than results. If you show the right values, the company is willing to invest in you to improve your results. Values are king! The time you spend researching and preparing before your interview will be reflected in your confidence and mental flexibility during the interview. You need to be ready in advance for the questions you will be asked, have done your research on both the company and the job, and have all your necessary documents and references with you. To help you succeed at the interview and land a proper job offer, I’ve broken down the process into three sections:

  1. before the interview,
  2. during the interview and
  3. after (yes, after) the interview.

Before the interview

You’ve read the job description, tweaked your resume and sent it in. You’ve already done something right because they’ve called you in to meet you. But before you do anything else, take a closer look at the job description, particularly the list of requirements, and compare your skills and experience to what is described there. Take note of the skills, experience and knowledge required to perform the job, then match them up to your own skills, experience and knowledge. Compare what the company is looking for to your abilities and make mental bridges from one to the other.

Next, get to know the company well. Thoroughly research the company you are about to interview with, using a variety of information avenues. Start by visiting the company’s website. Read through most of it to gain a better understanding of the firm’s history, range of business, their products and their customers. Find the company’s mission statement, read it over a few times to really understand what they are aiming for. Then read some press releases. This will give you a little insight into what the company is presently working on and where they are headed. Does the firm have a blog, a LinkedIn page? Search social media for the company and sift through their reviews. Search the company, it’s CEO and other officers in the News section of Google. Use the company’s products or services, if at all possible. For example, in terms of Volkswagen commercial vehicles, install our ConnectFleet App. Use Glassdoor.com to read reviews of the company from its former and present employees. Interpret this information with a grain of wisdom as it’s as reliable as Yelp. Happy people write glowing reviews, angry people write scathing ones. Throughout all of this, prepare a list of questions that you have about the company, its products and its customers. Finally, if you know the name of the person interviewing you, look them up in LinkedIn. Note anything you have in common with them or any connection you share. Keep it professional though. Don’t go overboard, looking them up on Facebook and then remarking on their summer holiday pictures. That kind of stalker behavior will not get you to the next level of interviews!

The next phase of the before-the-interview work is to try to anticipate the questions you might be asked so you can prepare and polish up your answers. Don’t just think about the easy questions. Try your best to prepare your answers to tough questions that might come up. You don’t want to be caught off-guard, unable to answer a difficult question. A simple Google search of interview questions will produce a long list for you to practice with. One such search is “the 50 most common interview questions.” In my opinion, the most important question is about your strengths and areas where you know to improve on. Please think about this and be able to list three areas where you may to improve on. It’s really important.
If it’s possible, set up a mock interview with someone you trust to give you honest, constructive feedback. A mock interview is one of the best tips I have for preparing for your interview. It will help you look prepared and confident. During a mock interview, you will see your communication weaknesses and you can work on them. Ask for feedback on your enthusiasm. Too much makes you look fake, too little and your interviewer will think you are not really interested in the position. Check your body language. Are you sitting up straight? No “manspreading.” Don’t cross your arms in front of yourself. Don’t roll your eyes. Smile! If you’re not natural at doing these things, you need to practice.

Since you are applying for a data science or machine learning engineer role, be ready to answer these 10 questions or ones very similar to these:

  1. Why is a validation set and test set needed when training a model? How do they differ?
    When training a model, the available data is split up into three distinct sets:
    1. Training dataset: We use the training dataset for fitting the model’s parameters. The accuracy that we achieve on the training set, however, is not reliable for predicting if the model will be accurate on new samples.
    2. Validation dataset: We use the validation dataset to measure how well the model performs on examples that weren’t part of the training dataset. The metrics calculated on the validation data can be used to tune the hyper parameters of the model. Nevertheless, every time we evaluate the validation data and make decisions based on those scores, we are leaking information from the validation data into our model. The more evaluations, the more information is leaked. So we can end up spoiling our model by over-fitting to the validation data, and once again the validation score won’t be reliable for predicting the behavior of the model in the real world.
    3. Test dataset: We use the test dataset to measure how well the model does on previously unknown examples. It should only be used once we have tuned the parameters based on the validation set. So if we leave out the test set and only work with a validation set, the validation score won’t be a good estimate of the generalization of the model.
  2. When do you use a stratified cross-validation and what is it?
    Cross-validation is used for dividing data into training and validation sets. Typically this is done randomly. In stratified cross-validation, however, the split should maintain the ratio of the categories on both the training and validation datasets. If we have a dataset with 10 percent of category A and 90% of category B, for example, and stratified cross-validation is used, we’ll have the same proportions in the training and validation datasets. If you don’t use stratified cross-validation, for example simple cross-validation, a typical worst-case scenario could be that the validation dataset has no samples of category A.
    Stratified cross-validation can be applied in the following scenarios:
    1. On a dataset with multiple categories: The smaller the dataset is and the more imbalanced the categories are, the more important is it to use stratified cross-validation.
    2. On a dataset with data of different distributions: In a dataset for autonomous driving, for example, we may have images taken during the day and at night. If we do not make sure that both types are present in training and validation, we may have generalization problems.
  3. Did you follow some challenges on Kaggle? Why do ensembles typically win these challenges with higher score than individual models? What are their advantages and disadvantages? Shouldn't we always go for ensembles as solutions?
    An ensemble combines the application of various models to create a single prediction. The reason that ensembles outperform individual models is that each model makes different errors. So in an ensemble, the errors of one model can be compensated by the right predictions of the other models. This is the reason why the score of the ensemble is usually higher. We need a high diversity of models for creating a good ensemble algorithm.
    A diverse ensemble can be achieved by:
    1. Combining different machine learning algorithms. For instance, we could use decision trees, logistic regression, and k-nearest neighbors for building a model and leveraging the strengths of each of those models.
    2. Splitting the data in different training subsets, which is called bagging.
    3. Weighting the samples of the training set. If you iteratively assign a different weight to the samples according to the errors of the ensemble, this is called boosting. A high number of winning solutions at Kaggle, for example, are based on ensembles. In real-world projects, however, machine learning engineers have to balance the execution time and accuracy, why ensemble are not always the first choice.
  4. Can you explain regularization and give some examples of methods?
    The objective of regularization is to provide a validation score, in some cases at the cost of worsening the training score. Common regularization methods are:
    1. L1: This tries to minimize the absolute value of the parameters of the model in order to produce sparse parameters.
    2. L2: This tries to minimize the square value of the parameters of the model in order to produce parameters with small values.
    3. Dropout: This is a method, which is often applied to neural networks, which randomly assigns zero to some of the neurons’ outputs during training. This enforces the networks to reduce complexity and learn better representations of the data. Each neuron is forced to learn the most useful features.
    4. Early stopping: This method stops training when the validation score converges, i.e. stops improving, even when the training score may increase. This is important to prevent over-fitting on the training data.
  5. Please explain “the curse of dimensionality.” Can you list some techniques on how to cope with it?
    The curse of dimensionality can occur when the data consists of many features. If the dataset doesn’t have enough samples to cope with these many features, it gets very hard to train the model correctly. For example, a training dataset of 100 samples with 100 features will be very hard to train, since the model will find random relations between the features and the target during training. If the dataset, however, consists of 100,000 samples, which cover 100 features, the model will be able to correctly learn the relationship since enough data points are available.
    There are many methods to fight the curse of dimensionality:
    1. Feature engineering: It’s possible to create new features that combine multiple existing features. Statistics like the mean or median of a feature are good examples for feature engineering.
    2. Dimensionality reduction: There are a bunch of methods that help to reduce the dimensionality of the features, such as principal component analysis (PCA) or neural network based autoencoders. We are also using them at Volkswagen commercial vehicles :-)
    3. Feature selection: The dimensionality can be reduced by training the model on a smaller subset of features instead of using all of them.
    4. L1 regularization: Since the L1 regularization method outputs sparse parameters, it also helps to deal with high-dimensional input data.
  6. Please explain the phrase “imbalanced dataset.” Can you list some techniques on how to cope with it?
    An imbalanced dataset consists of different proportions of target categories. A dataset with medical images, for example, where we try to learn some illness, will typically have many more negative than positive samples. Let’s assume that 98% of images are without the illness and 2% are with the illness. That is a highly imbalanced dataset.
    There are different ways to cope with imbalanced datasets:
    1. Undersampling and oversampling: Data usually gets sampled with the uniform distribution. Under-and-oversampling means that we use other distributions in order to provide our model a more balanced dataset.
    2. Data augmentation: We can provide reasonable fake data to the less frequent categories by modifying existing data in a controlled way. Regarding our dataset with the images for detecting some illness, we could flip or rotate the images with illness or add noise to copies of images in such a way that the illness is not removed from the image.
    3. Using the right metrics: Regarding our example dataset, if we had a model, which would always make negative predictions, it would achieve a precision of 98%. There are a number of other metrics such as precision, recall, and F-score, which describe the accuracy of the model better for imbalanced datasets.
  7. Could you explain please the differences between unsupervised, supervised, and reinforcement learning?
    Supervised learning aims at training a model to learn the relationship between the input and output data. To achieve this we need labeled data.
    Regarding unsupervised learning, we only have unlabeled data. The model will learn a representation of the data and its underlying statistics. Unsupervised learning is usually used to initialize the parameters of the model when we have a lot of unlabeled data and only a small fraction of labeled data. So we first train an unsupervised model and afterwards we leverage the weights of the model to train a supervised model.
    Models based on reinforcement learning have input data and a reward, which depends on the output of the model. So the model learns a policy, which maximizes the reward. Reinforcement learning has been successfully applied to a number of use cases such as strategic games like Go, even the good-ole Atari video games, and Poker where you have to deal with uncertainty. Tesla tries to build their software module for autonomous driving on reinforcement learning.
  8. What are the key drivers for the recent rise of deep learning in the past decade?
    Basically, the success of deep learning in the past decade is driven by three main factors:
    1. More data: The access to massive labeled datasets, for example on Kaggle, enables us to train models with more parameters, cope with the curse of dimensionality, and achieve reasonable state-of-the-art scores. Other machine learning algorithms don’t scale as well as deep learning, when they have to cope with high-volume data sets.
    2. GPUs: Training models on GPUs reduces the training time tremendously. This can go up to orders of magnitude compared to training on CPUs. Currently, cutting-edge models are trained on multiple GPUs in parallel or even on specialized hardware, for example from NVIDIA.
    3. Improvement in algorithms: Last but not least, there have been significant improvements on many algorithms such as ReLU activation or dropout and new complex (for example deep) networks architectures have been developed. This has been another very significant factor.
  9. Can you please explain data augmentation and list some examples?
    Data augmentation is a method for creating new data by modifying existing data in a reasonable way. This means the target value is not changed, or changed in a known way. Computer vision is one of the fields where data augmentation is tremendously helpful.
    There are many modifications which we can perform on images:
    1. Resizing
    2. Flipping (horizontally or vertically)
    3. Rotating
    4. Adding noise
    5. Deforming
    6. Modifying colors
    For each problem a customized data augmentation pipeline has to be developed. For example, on OCR problems, doing flips will change the text and won’t be beneficial on the problem you want to solve. Resizing or small rotations, however, may help to improve the robustness of the model.
  10. What are convolutional networks and where can we leverage them? Convolutional networks belong to the family of neural networks and are based on convolutional layers instead of fully connected layers. On a fully connected layer, all the output units are connected to all input units through weights. On a convolutional layer, however, we have some weights that are repeated over the input. This means that a group of inputs is connected to one output.
    The advantage of convolutional layers over fully connected layers is that the number of parameters can be reduced tremendously. This leads to better generalization of the model. If we want to learn a transformation for a fully connected layer, for instance, from one 10x10 pixels image to another one of the same size, we need 10,000 parameters. If we use two convolutional layers (with the first one having nine filters and the second one having one filter and a kernel of 3x3 pixels) we can reduce the number of parameters to only 90.
    Convolutional networks are applied where data has a clear dimensionality structure. Time series analysis is an example where one-dimensional convolutions are used. Image data is an example where two-dimensional convolutions are used. Volumetric data or spatio-temporal data can be handled by three-dimensional convolutions.
    In 2012 AlexNet won the ImageNet challenge. Since then, computer vision has been dominated by convolutional networks.

Okay, so after getting all of these questions into your head and understanding the answers, it’s time to think about what to wear to your interview. You want to look professional, not stuffy. You want people to take you seriously, not think you’re the cool new hire. Generally speaking, it doesn’t hurt to dress more formally for an interview than you would on the job – especially in Germany. If you can, ask an employee about the dress code.
If you don’t have anyone to ask then follow these 4 rules:

  1. Over dress rather than underdress: wear a suit or very smart business attire. Don’t wear jeans, even if you know others on the job do.
  2. Keep it neutral: chose colors that are calm and plain. This is not the place for bright colors or patterns, showing off your fun-loving self.
  3. Don’t burn yourself on the iron: Yes, take the iron out, dust it off and plug it in. Nothing says “Please don’t take me seriously” like a wrinkled shirt or trousers.
  4. Carry a briefcase: even if it’s empty (though it won’t be when you read the next section). It will make you look like you are ready to get down to business.

During the interview

Okay, the day has come. You’ve Googled everything you possibly can on the company, its products, its customers, and its employees. You are prepared with a few good, solid questions about the firm and you are ready to answer any question they throw at you. So let’s talk about what to put in that briefcase. Besides your lucky rabbit’s foot, you should bring ample copies of your documents with you to the interview. If, for example, you know you are meeting with 3 people on interview day, bring 5 copies of everything. These documents include your resume, your list of references, copies of letters of reference, and samples of your work (written pieces, designs, etc.) You also want to bring a note pad and a nice pen, which you are sure works well. It’s a good idea to have a list of any information you might need to fill in an application form.

On the day of the interview, you want to arrive early, not on time, early. Not too early, as that is as bad as showing up late. 10 minutes early is ideal. Aim for that.  You want your first impression to be of a person who is, and who will always be, ready. This means: you will map out your route, find the nearest subway stop, nearest parking lot, etc. Take a practice run over to the offices beforehand to time your commute so you can anticipate traffic delays or parking nightmares. When you get to the building, but before you go inside, turn off your cell phone so it isn’t buzzing or ringing during your interview. This is their time, and they have your full attention.

Speaking of attention, look out for non-verbal communication. Know that your waiting room behavior may be monitored. Make eye contact with people and smile at them. During your mock interview, you should have practiced your firm but not pulling handshake. Let the interviewer set the tone for the interview and follow along. Don’t let an interviewer’s friendly demeanor put you off guard. Remain a little formal, even if the setting is a little casual.  Sit up straight, keep good eye contact, but don’t stare. Be aware of nervous ticks like foot tapping. If you catch yourself doing it, adjust yourself in your seat, cross your legs or slightly move to stop yourself calmly and without abruptness. Manage your reactions, be confident but don’t brag. Smile, laugh a little at jokes but don’t laugh so hard that you cry. And be respectful of the space. Don’t put anything on anyone’s desk. Instead, hand it to them personally. Be sure to ask the questions that you’ve prepared. If you weren’t able to gage when to interject with one of your excellent questions, then wait until the end of the interview when the interviewer will ask if you have any questions. Of all the questions on your list, be sure to ask: What are the next steps in the hiring process?

After the interview

Phew! You’ve left the interview with a firm handshake and a final eye contact with a smile. If you prepared and practiced, then you’ve done a great job presenting yourself during the interview. The only thing left to do is to follow up with a few simple tasks. First, let the people on your reference list know that the company may be contacting them in the near future.  There is a debate about whether or not to send a thank you note to the main person you interviewed with. It is up to you if you think this is a strategic move or not. A simple note would mention your gratitude for the opportunity to meet with him/her and your enthusiasm toward working with such a (fill-in-the-blank) company. Close with a simple way they can reach you.


I’ve laid out my best recommendations on how to prepare for a machine learning interview. It was lengthy, I know. But if you do nothing else, review all of the aforementioned 10 machine learning questions and try to get the answers into your head. If an answer is not clear to you, try to Google alternative explanations. Ideally, register at Kaggle or try to publish a paper on conference, like Strata, to be listed as Data Scientist on the web.

To sum it up, the right preparation can help you project confidence during your interview. These are some practical steps you can do to prepare for a job interview and stand out from the less-prepared candidates. By following the advice in this article, you will be better prepared and more confident for your interview. Good luck!