This time BCGX common feeling that OA gives many people is: "Finally, I took an OA exam that resembles real data science work content."
It’s not about asking you to solve algorithmic problems with ultimate performance, nor is it a LeetCode style brain teaser. It is about starting from getting dirty data and working your way up to prediction results that can be used to show business/products, a complete DS/ML production pipeline.
The three questions are strongly related and are a typical "one-stop" examination. It is strongly recommended to treat them seriously as a small project.

Q9 – Data Cleaning & Feature Aggregation
Problem Description:
You are given a housing master table and a user activity log table.
The housing table contains basic information such as house_id, location, price, and publish_time.
The activity table records user views with fields including house_id and timestamp.Your task is to:
- Clean and standardize column names across both datasets.
- Aggregate the number of views within a specified time window.
- Merge the aggregated results back into the housing master table.
The essence of this question is to check the basic abilities of data engineering.
The focus is not on complex logic, but on whether you can:
- Unify the confusing column name format (uppercase and lowercase, spaces, underline)
- Correctly handle time fields and filter by time window
- Use
GroupbyCount the number of views - Reasonably handle listings without browsing records after merge (usually fill in 0)
The whole process is quite close to the "behavioral characteristic construction" process in real business.
Q10 – Data Preprocessing & Feature Engineering
Problem Description:
You are provided with pre-split training and testing datasets.
Prepare the data so that it can be directly used for machine learning models by:
- Handling missing values appropriately.
- Encoding categorical features.
- Standardizing numerical features.
Ensure that the same preprocessing logic is consistently applied to both training and testing datasets.
This question has clearly entered the standard process before modeling.
The actual test is not "which method to use", but whether you understand the correct approach to ML pipeline:
- Numerical features and categorical features should be treated differently
- Category mapping must be established on train and reused on test
- Standardization (such as StandardScaler) can only be performed on the training set fit
If you refit the preprocessing step on test, it will basically directly expose your unfamiliarity with the modeling process.
Q11 – Model Training & Evaluation
Problem Description:
Using the preprocessed dataset, train a regression model to predict house prices.
Specifically, you should:
- Train a HistGradientBoostingRegressor on the training data.
- Generate predictions on the validation dataset.
- Output both the predictions and an evaluation metric measuring validation error.
This question is a natural conclusion to the previous two questions.
The model itself does not require you to adjust parameters in any complicated way. The key points are:
- Whether it can completely run through the training → Verification → Evaluation process
- Whether to use error metrics (e.g. RMSE/MAE) for regression problems correctly
- Whether to ensure strict separation of training and validation data
As long as the previous feature engineering is done in a standardized manner, this question is basically a matter of course.
Overall Impression
In summary, BCGX’s OA:
- More biased towards real DS/ML project process
- Emphasis on basic skills of data cleaning, feature engineering and modeling
- We don’t pursue fancy algorithms, but we attach great importance to process correctness and engineering habits.
Preparation Tips
Some very practical suggestions for students preparing for the future:
- You must be familiar with merge / groupby / datetime of pandas
- Familiar with sklearn’s preprocessing + modeling pipeline
- Clarify the boundaries of train / validation / test
- Treat OA as a "small project" rather than a test
If you have done projects before, this set of OA is actually a plus.
I was tortured so much by BCGX OA that I almost had a mental breakdown.
... Later, a friend gave me a hand and introduced me to him. ProgramHelp Only with a mentor can you pass the last mile.
To be honest, I thought it was “just another job search service” at first, but after using it, I felt it was really different:
– The mentors are all real working people with 2-8 years of experience. They are basically working middle- and senior-level employees from major companies/consulting/quantitative/US stock listed companies.
– Many of them are “fresh comers” who have just received similar offers a while ago, or have just become regular employees. They know exactly what they are stuck with now.
– It’s not like an organization stuffing you with a bunch of templates and routines, but it really accompanies you to go through the OA pipeline, Decomp dismantling, Learning to write code on the spot, and Government’s soul torture one by one.
– The two things that I am most afraid of are BCGX train-test contamination and Palantir on-site reading of unfamiliar schema. The instructor helped me clarify my ideas and practiced simulated scenarios multiple times before I stabilized it. – The key is that the price is still acceptable, and it is not the kind of “high-end packaging” that costs thousands or tens of thousands.
Looking back now, BCGX OA was able to pass thanks to their seamless assists. Multiple classmates around me also started looking for him one after another, and most of them said, "If I had known, I would have looked for him earlier."
If you are also stuck in areas such as pipeline, expression, process, and government pitfalls, it is really recommended to talk to reliable people instead of fighting on your own.