Data scientist is one of the hottest tech jobs out there, and the interview process reflects this. From screening to on-site meeting, you’re in for what could be a months-long process. Let that sink in, then let your heartbeat return to normal - you can prepare.
With some slight variation, the average data science interview consists of three stages:
- Fit Check
With the help of Data Science alums from companies like Facebook, AirBnB, and Google, we’ve got the ultimate resource to help you through each stage.
Stage 1: Screening
Most data science hires will go through both a technical and a behavioral screening. These can come in many forms, but you’ll come across coding challenges and quick phone interviews with an HR rep most often.
Both the coding challenge (ex. Make a prediction model based on some scraped data) and the HR screen (“tell me about yourself”) will be pretty basic. Both are meant to weed out inflated resumes.
The only ways you can fail here are through over-stressing or under-preparing. If you’ve been through practice questions on the basics of data science, and you’ve got a thorough understanding of the company you’re interviewing with and the role in question, you’ll be just fine.
Preparing for the Coding Challenge:
- Review the basics. Traditional data structures, SQL, and simple projects for Python/R. For the self-directed, StackOverflow is full of sample questions on all kinds of general data science topics. If you’re looking for more structured practice:
- Practice! For a realistic sample coding challenge, work your way through this Analytics Interview Kit. You’ll do some basic data analysis in Python.
- While working on the challenge, if you’ve got the time, take a step away to clear your head before giving your answers a final read-through.
If you do fail the screening questions, don’t stress too much - some companies give ridiculous challenges. Otherwise, they might be looking for very specific answers. If you’re spending hours and hours on a take-home coding challenge, ask yourself whether this time might be better spent preparing for other interviews.
Preparing for the Behavioral Screening with HR:
- Familiarize yourself with the company and the job description and cross-check with your resume. Prepare to talk about overlap and think about how to address gaps.
- Read through the company’s reported sample interview questions on Glassdoor, and prepare some answers on the most common (especially “tell me about yourself”).
- If you can, look up the recruiter on LinkedIn beforehand and check for anything you two have in common - be positively memorable.
Stage 2: Evaluation
This is the meat of the interview. You can expect a technical call and/or a take-home project and an on-site interview. You’re now a legitimate contender, and your interviewer is testing you to figure out whether you can do the job. Expect rigorous questions, and be prepared to demonstrate your thought process.
Preparing for a live evaluation, whether on-site or on a call:
You’ll likely be tested on:
Here are some sample questions you may come across.
Topic: Linear Regression Modeling
Question: You are modeling marketing return on investment (ROI). You have each month’s revenue on the Y axis and spend on the X axis.
You decide to use a simple linear regression model to evaluate whether spending more would generate more revenue. You find your linear intercept (b) is $1.5MM and gradient (a) is 2.1. Your residual standard error is 79.1 and your adjusted R-squared is 0.72 with a p-value of 1.09e-9.
A. How much of your data’s variance has your model explained and can the result be called significant?
B. Our problem requires more accuracy in modeling the data. How can we alter the linear equation to better fit the data? What regression model would you pick and why?
C. Your new model explains 98% of the data variance. How would you determine if your model is overfitting? How would you evaluate the model overall fit and parameters fit?
HINTS BELOW: Stop here to think before moving on :)
Hint: The p-value is very small. What does that tell you?
Hint #2: Theoretically, what would increase the complexity of the model?
Short Answer to part A: The R-squared value is a statistical measure of how close the data are to the fitted regression line. R-squared values can range from 0 to 1, with a value of 1 meaning that the model explains ALL variability of the response data around it’s mean, so your R-squared of 0.72 indicates that your model explains 72% of variance.
Hypothesis testing tests the validity of a claim being made about a population. You want to know whether greater marketing spend will increase revenue, and you want to ensure that your result (whether the answer is “yes” or “no”) isn’t a sampling error; that is to say, it’s statistically significant. A statistically significant result is one which is not likely to have occurred by chance - instead, it’s likely attributed to a specific cause. You’ll want to set your desired significance to be above 95%, possibly even 98%. Your p-value, which can range from 0 to 1, represents the “strength of the evidence” that marketing spend has an impact on revenue -- that any conclusion drawn is due to a specific cause, not by chance. A small p-value (typically less than or equal to 0.05, or 95% confidence) indicates “strong evidence” or statistical significance. Your p-value is smaller than this, therefore your result is significant. For further Q&A on this data set and more, check out Exponent’s Data Science Course.
Topic: Basic, necessary tools such as if-/else-statements and loops
Question: In any language you're comfortable with, write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
Answer: View sample code and documentation here.
Topic: Feature Selection
Question: Say you’re a data scientist at a used vehicle dealer and your manager wants to know which vehicles are most likely to have a higher four-year resale value. You have access to a vehicle data set with many attributes associated with valuation. We can build a predictive model to determine which vehicles have higher 4-year resale values. As is often the case we need to perform pre-processing steps to handle the data before we are ready to build our predictive model. In this question, the missing and NA values have been handled appropriately and non-numeric data are converted into numeric or binary dummy features that can be easily processed.
Take a moment to load the relevant Python Modules and load the data drive through Exponent’s Data Science Course.
How would you determine which features to include in the model training data set? Consider both simple and more advanced methods of feature selection and dimension reduction. This may include exploratory data analysis, plots, and analysis methods. Include in your answer HOW you decided which features to keep and which to eliminate.
Answer: For a full answer and more in-depth Q&A on this data set and more, check out Exponent’s Data Science Course.
Topic: Relational database structure
Question: When building a relational database, describe the difference between a logical data model and a physical data model.
Answer: After a high-level conceptual model has been created and basic entities defined, the next step is to build a logical data model. The logical data model includes attributes (text, numbers, dates, etc.) and primary and foreign keys; in essence how each entity is related to each other. The physical data model maps the specific data sources which will be linked together, and is the most detailed view of the three. It represents the specific database as implemented. Read through a more comprehensive explanation here.
There are no specific answers here - interviewers are more interested in your thought process. Nowadays, you’ll likely run into questions that will test your business acumen rather than the “estimation” questions asked years ago. Test yourself on the below:
Tips for Success on a Take-Home Project
This component may or may not be included, but if so, the point is to simulate a situation you’ll deal with at work: they’ll give you some data, and a simple if vague request e.g. “identify trend(s) and explain them to a non-technical stakeholder.” Some real-life examples of these (per Glassdoor) are:
- Home Depot: Create a recommendation system based on in-house data. Data cleaning/processing required.
- Airbnb: Evaluate the impact of a sample initiative on bookings.
- Expedia: Develop a revenue optimization model based on historical bookings data
- For more on take-home project Q&A, Exponent’s data science course will take you through a three-step project based on creating a recommendation algorithm for Airbnb.
You’ll have time to organize your thoughts here - most companies will give you at least a few days to complete the project. So be strategic. This means:
- Spend some time on exploratory data analysis first.
- Use all the data visualization tools at your disposal, especially Tableau (as you’re likely to run into it at work). Yes, you could code your own in Pandas, etc. but is that really the best use of your time at this stage?
- Don’t be afraid to reach out to ask clarifying questions. In this scenario, would you present to a team of engineers or to a director of marketing? Do you need more data to flesh out your ideas/hunches? Intelligent questions are more than appropriate here.
Stage 3: Fit Check
This will likely take the form of an onsite lunch interview, or a quick meet-and-greet. The hard part’s over - they’re convinced you can do the job. Now they want to make sure it’s a mutual fit.
Preparing for a Fit Check
- Be yourself. You're also evaluating this company. Can you picture yourself working with these people? In this environment?
- Do your research. Who's meeting you? Should you dress business-casual or does the culture lean toward jeans and band tees?
- Come prepared with questions. What's an average day like? What kind of benefits can you look forward to? And (importantly) how's the cafeteria?
If you're stuck on questions to ask, check out this list of 50 questions interviewers ask to check for culture fit. Flip the script and ask a few! A personal favorite is the always-illuminating: what's the best book you've read recently? You'll learn a lot about that person quickly.
As a data scientist, you’re constantly optimizing. Don’t neglect this tendency in your job search - track how you’re doing and where you’re feeling discomfort. What stood out to you as a weak area within this article? That should be your next area of focus.
And take heart when the process starts to drag. You’re not alone, and there are plenty of opportunities to support each other. This is your community; not the competition. There are more jobs than data scientists. Reach out and keep learning.
Looking for more in-depth preparation? How about a community of thousands populated by PMs and Data Scientists at the likes of Google, Facebook, and Amazon? If you’re ready to level-up your job search, Exponent’s got you covered.