BCGX OA 面經分享｜三道真題完整覆盤

這次 BCGX OA 給很多人的普遍感受是：“終於考了一次像真實資料科學工作內容的 OA”。

不是讓你去拼極致效能的演算法題，也不是 LeetCode 風格的腦筋急轉彎，而是從拿到髒資料開始，一路做到能拿來給業務/產品看的預測結果，完完整整一條 DS/ML 生產 pipeline。

三道題強關聯，屬於典型的“一條龍”考察，強烈建議當成一個小專案來認真對待。

Q9 – Data Cleaning & Feature Aggregation

Problem Description:

You are given a housing master table and a user activity log table.
The housing table contains basic information such as house_id, location, price, and publish_time.
The activity table records user views with fields including house_id and timestamp.

Your task is to:

Clean and standardize column names across both datasets.

Aggregate the number of views within a specified time window.

Merge the aggregated results back into the housing master table.

這一題本質是資料工程基礎能力檢查。

重點不在複雜邏輯，而在於你是否能：

統一混亂的列名格式（大小寫、空格、下劃線）
正確處理時間欄位並按時間視窗過濾
使用 groupby 統計瀏覽次數
在 merge 後合理處理沒有瀏覽記錄的房源（通常填 0）

整體非常貼近真實業務中「行為特徵構造」的過程。

Q10 – Data Preprocessing & Feature Engineering

Problem Description :

You are provided with pre-split training and testing datasets.

Prepare the data so that it can be directly used for machine learning models by:

Handling missing values appropriately.

Encoding categorical features.

Standardizing numerical features.

Ensure that the same preprocessing logic is consistently applied to both training and testing datasets.

這道題開始明顯進入建模前的標準流程。

實際考察的不是“用哪種方法”，而是你是否理解 ML pipeline 的正確做法：

數值型特徵與類別型特徵要區別對待
類別對映必須在 train 上建立，並複用到 test
標準化（如 StandardScaler）只能在訓練集 fit

如果在 test 上重新 fit preprocessing step，基本就是直接暴露對建模流程不熟。

Q11 – Model Training & Evaluation

Problem Description :

Using the preprocessed dataset, train a regression model to predict house prices.

Specifically, you should:

Train a HistGradientBoostingRegressor on the training data.

Generate predictions on the validation dataset.

Output both the predictions and an evaluation metric measuring validation error.

這一題是對前兩題的自然收尾。

模型本身不要求你複雜調參，重點在於：

能否完整跑通訓練 → 驗證 → 評估流程
是否正確使用迴歸問題的誤差指標（e.g. RMSE / MAE）
是否保證訓練與驗證資料嚴格分離

只要前面的特徵工程做得規範，這一題基本就是順水推舟。

Overall Impression

整體來看，BCGX 的 OA：

更偏真實 DS / ML 專案流程
強調資料清洗、特徵工程和建模基本功
並不追求花哨演算法，但非常看重流程正確性和工程習慣

Preparation Tips

給後面準備的同學幾點非常實在的建議：

pandas 的 merge / groupby / datetime 一定要熟
熟悉 sklearn 的 preprocessing + modeling pipeline
明確 train / validation / test 的邊界
把 OA 當成「小專案」，而不是刷題

如果你之前做過專案，這套 OA 其實是加分項。

被 BCGX OA 虐得夠嗆，中間差點心態崩了

…… 後來是朋友拉了我一把，介紹我找了 ProgramHelp 的導師，才算是把最後一哩路跑通。

說實話，我一開始也覺得“又一個求職服務”，結果用下來真覺得不一樣：

– 導師全是真正在幹活的，2-8年經驗，基本都是大廠/諮詢/量化/美股上市的在職中高階

– 很多就是前陣子自己剛拿完同類offer，或者剛轉正的“新鮮過來人”，特別懂現在到底在卡什麼

– 不像機構那樣給你塞一堆模板和套路，而是真的陪你把 OA pipeline、Decomp 拆解、Learning 現場寫程式碼、Government 那幾句靈魂拷問一個個過

– 我自己最怕的 BCGX train-test 汙染、Palantir 現場讀陌生 schema 這兩塊，都是導師幫我捋清楚思路+模擬場景練了幾遍才穩住 – 關鍵是價格還算能接受，不是那種動不動幾千上萬的“高階包裝”

現在回頭看，BCGX OA 能過，真的多虧了他們的無痕助攻。身邊好幾個同學也陸陸續續找了，基本都說“早知道就早點找了”。

要是你也卡在 pipeline、表達、流程、Government 坑位這些地方，真的建議找靠譜人聊聊，別自己死磕。

Jory Wang Amazon資深軟體開發工程師

Amazon 資深工程師，專注基礎設施核心系統研發，在系統可擴充套件性、可靠性及成本最佳化方面具備豐富實戰經驗。目前聚焦 FAANG SDE 面試輔導，一年內助力 30+ 位候選人成功斬獲 L5 / L6 Offer。

See Full Bio