Recently, many students have received Capital One OA for Data Scientist New Grad. In summary, the difficulty is not high: the first two questions are on the easy side, mainly testing basic data processing and logic; the last two questions are mostly simulations, which require deducing the final state based on a series of operations, and test modeling thinking and detail control. Below I will share the types of real questions and core examination points of this Capital One DS OA to help you establish ideas in advance and avoid being slowed down by simulation questions.
Question 1: Basic data analysis + CSV output
Require:
- Read drivers (drivers.csv) and multiple trip data (rides_1.csv ~ rides_4.csv);
- Do basic data cleaning;
- Consolidation and statistics;
- Save analysis results as a CSV file.
Problem-solving ideas:
- Read driver data, calculate the average rating and the proportion of drivers who speak a second language.
- Combine the four order data and calculate the proportion of successful orders.
- Construct the result data and save it.
Question 2: Time Features + Extended Field Analysis
Require:
The data time point is fixed at 2023-04-15, more fields (such as started_driving_year) are introduced in drivers.csv, and time-derived features (such as driving age) are calculated based on "today".
Problem-solving ideas:
- Based on 2023-04-15, the vehicle table is processed to obtain the number of days between inspections, and the driver table is processed to calculate the length of service.
- Merge the four order tables and calculate the total number of likes grouped by driver ID.
- The driver table is used as the main table to connect the vehicle and order processed data on the left, and fill in 0 for the empty number of likes.
- Arrange the column order as required and save the results.
Question 3: Driver portrait/performance indicator data set construction
Require:
Given a compiled driver performance data set, understanding the business meaning of each field is no longer about “calculating indicators”, but rather about understanding the questions.
Problem-solving ideas:
- Calculate the mean using only the training set, filling in missing values for age in the training and test sets and rounding.
- Category coding is done based on the training set, and new categories not found in the training set are uniformly coded as – 1.
- The net tip value is normalized by the mean and standard deviation of the training set, a parameter shared by the training and test sets.
- Make a fixed code for the driver level, save the data as required and limit the net tip value to 5 decimal places.
Question 4: Machine learning classification task
Require:
Based on the cleaned data from the previous question, the training model predicts the driver’s Driver_class(0/1), Class B (1) is the positive class, maximizing recall while ensuring precision is not too low.
Problem-solving ideas:
- Read training, validation, and test data, and eliminate irrelevant ID columns.
- Merge training and validation sets, separate features and target variables driver rating.
- A random forest classifier with balanced class weights is trained on the full data.
- Use the trained model to predict the test set and save the driver grade results as required.
C1 DS OA Don’t want to roll over?
If you have recently received Capital One DS OA, it is recommended to familiarize yourself with high-frequency question types in advance, especially simulation questions that are particularly time-consuming. It’s not that many people don’t know how to do it, but they get stuck in the middle, and it’s difficult to finish writing when the rhythm is messed up.
We have been sorting out OA real questions and high-frequency models from major North American manufacturers, and we are also familiar with the direction of C1 questions. If you want to be more stable, or don't want to have an online test affect subsequent interviews, you can also come and find out more. OA Assisted Support , to help you avoid detours at key points. Many students have successfully advanced to the next round.