Capital one OA online test review (ng version) | Capital one ds OA passed with perfect score

Recently, Capital One has released a lot of OAs, and the Capital One OA is honestly not as difficult as people imagine. The first two questions are relatively straightforward—around easy level. The real challenge lies in questions 3 and 4, which are usually medium to hard.

Question 1: Basic data analysis + CSV output

Requirements: Read the driver data (drivers.csv) and multiple ride datasets (rides_1.csv–rides_4.csv); perform basic data cleaning; merge and aggregate the data; and save the analysis results to a CSV file.

Solution Approach: Load the driver data and compute the average rating as well as the proportion of drivers who speak a second language. Merge the four ride datasets and calculate the proportion of rides with a successful status. Construct the final result dataset and save it to a CSV file.

Question 2: Time Features + Extended Field Analysis

Requirements: Fix the reference date to 2023-04-15; introduce additional fields in drivers.csv (such as started_driving_year); and compute time-derived features (e.g., driving experience) based on this “current” date.

Solution Approach: Use 2023-04-15 as the reference date: process the vehicle table to compute the inspection interval (in days), and process the driver table to calculate driving experience (tenure). Merge the four ride datasets and group by driver_id to aggregate the total number of likes. Use the driver table as the primary table and left-join the processed vehicle data and aggregated ride data; fill missing like counts with 0. Reorder columns as required and save the final output.

Question 3: Driver portrait/performance indicator data set construction

Capital one ds OA 第 3 问：司机画像 / 性能指标数据集构建

Requirements: Given a preprocessed driver performance dataset, understand the business meaning of each field. This question is no longer about calculating metrics, but about correctly interpreting the data and the problem statement.

Solution Approach: Use only the training set to compute the mean, then fill missing age values in both the training and test sets and round them to integers. Perform categorical encoding based on the training set; any unseen categories in the test set are encoded as -1. Standardize the net tip value using the mean and standard deviation from the training set, and apply the same parameters to both the training and test sets. Apply fixed encoding to the driver class, save the processed data as required, and ensure the net tip values are rounded to five decimal places.

Question 4: Machine learning classification task

Requirements: Based on the cleaned data from the previous question, train a model to predict the driver_class (0 / 1). Class B (1) is treated as the positive class, and the goal is to maximize recall while keeping precision at an acceptable level.

Solution Approach: Load the training, validation, and test datasets, and remove irrelevant ID columns. Merge the training and validation sets, then separate features and the target variable (driver_class). Train a Random Forest classifier with class-balanced weights on the combined dataset. Use the trained model to generate predictions on the test set and save the driver_class results as required.

Preparation Summary

By the way, let me share the result. For this Capital One OA, I worked with ProgramHelp and used their end-to-end OA assistance. About a week later, I received an email from the recruiter informing me that I had advanced to the next round, the VO.

Jack Xu MLE | Microsoft Artificial Intelligence Technician

Ph.D. From Princeton University. He lives overseas and has worked in many major companies such as Google and Apple. The deep learning NLP direction has multiple SCI papers, and the machine learning direction has a Github Thousand Star⭐️ project.

See Full Bio