Capital One OA Online Assessment Review

30 Views
No Comment

Recently, Capital One has released a lot of OAs, and the Capital One OA is honestly not as difficult as people imagine. The first two questions are relatively straightforward—around easy level. The real challenge lies in questions 3 and 4, which are usually medium to hard.

Question 1: Basic Data Analysis + CSV Output

Capital one OA 第 1 问:基础数据分析 + CSV 输出

Requirements: Read the driver data (drivers.csv) and multiple ride datasets (rides_1.csv–rides_4.csv); perform basic data cleaning; merge and aggregate the data; and save the analysis results to a CSV file.

Solution Approach: Load the driver data and compute the average rating as well as the proportion of drivers who speak a second language. Merge the four ride datasets and calculate the proportion of rides with a successful status. Construct the final result dataset and save it to a CSV file.

Question 2: Time-Based Features + Extended Field Analysis

Capital one ds OA 第 2 问:时间特征 + 扩展字段分析

Requirements: Fix the reference date to 2023-04-15; introduce additional fields in drivers.csv (such as started_driving_year); and compute time-derived features (e.g., driving experience) based on this “current” date.

Solution Approach: Use 2023-04-15 as the reference date: process the vehicle table to compute the inspection interval (in days), and process the driver table to calculate driving experience (tenure). Merge the four ride datasets and group by driver_id to aggregate the total number of likes. Use the driver table as the primary table and left-join the processed vehicle data and aggregated ride data; fill missing like counts with 0. Reorder columns as required and save the final output.

Question 3: Driver Profile / Performance Metrics Dataset Construction

Capital one ds OA 第 3 问:司机画像 / 性能指标数据集构建

Requirements: Given a preprocessed driver performance dataset, understand the business meaning of each field. This question is no longer about calculating metrics, but about correctly interpreting the data and the problem statement.

Solution Approach: Use only the training set to compute the mean, then fill missing age values in both the training and test sets and round them to integers. Perform categorical encoding based on the training set; any unseen categories in the test set are encoded as -1. Standardize the net tip value using the mean and standard deviation from the training set, and apply the same parameters to both the training and test sets. Apply fixed encoding to the driver class, save the processed data as required, and ensure the net tip values are rounded to five decimal places.

Question 4: Machine Learning Classification Task

Capital one ds OA 第 4 问:机器学习分类任务

Requirements: Based on the cleaned data from the previous question, train a model to predict the driver_class (0 / 1). Class B (1) is treated as the positive class, and the goal is to maximize recall while keeping precision at an acceptable level.

Solution Approach: Load the training, validation, and test datasets, and remove irrelevant ID columns. Merge the training and validation sets, then separate features and the target variable (driver_class). Train a Random Forest classifier with class-balanced weights on the combined dataset. Use the trained model to generate predictions on the test set and save the driver_class results as required.

Preparation Summary

By the way, let me share the result. For this Capital One OA, I worked with ProgramHelp and used their end-to-end OA assistance. About a week later, I received an email from the recruiter informing me that I had advanced to the next round, the VO.

author avatar
Jack Xu MLE | Microsoft Artificial Intelligence Technician
Ph.D. From Princeton University. He lives overseas and has worked in many major companies such as Google and Apple. The deep learning NLP direction has multiple SCI papers, and the machine learning direction has a Github Thousand Star⭐️ project.
END
 0