TikTok VO Interview | Coding / System Design + Transformer Full Q&A Review

1,004 Views
No Comment

This time to share a student's TikTok Virtual Onsite interview. TikTok VO has always been relatively unique, not only to examine the common algorithm coding, but also combined with the details of the paper to ask some of the content of the eight, especially Transformer, optimizer, training parameters, these are the requirements of the foundation is very high.

1. Coding: Search in Rotated Sorted Array

The topic is a classic Leetcode 33:

Problem:
Given a sorted and rotated array of n distinct elements, find the given key in the array.

In other words, find the index of the target in a sorted rotated array.

The solution is also pretty standard:

  • Determine which part is ordered;
  • If target falls in the ordered interval, continue bisection in this half;
  • Otherwise go to the other half.

Complexity:

  • Time complexity: O(log n)
  • Space complexity: O(1)

The overall difficulty is not great, but you need to write clearly and concisely and be aware of boundary situations.

2. Eight strands: deep-dive questioning based on essays

This part was a bit hardcore, the interviewer gave a direct quote of the Transformer model and training parameter configurations from the paper, and then asked questions around it. The rough configuration is as follows:

  • Transformer: 32 layers, 32 heads (64 dim), rotary embedding with dim 32
  • Context length: 2048, used flash-attention
  • Training: random init, fixed learning rate, weight decay 0.1, optimizer Adam (momentum 0.9, 0.98, epsilon 1e-7)
  • Precision: fp16 with DeepSpeed ZeRO Stage2
  • Batch size: 2048, train for 150B tokens

The interviewer's questions were also very detailed, basically taking the parameters apart one by one:

What is the role of head, and can you write the formula for multi-head? What are some other attention methods?

What is rotary embedding? What are the advantages over other embedding methods?

Why use rotary embedding and what are the benefits? How to deal with long distance context?

What is Flash-Attention? Why is it accelerated?

Why is Adam suitable for this scenario? What do the three parameters (momentum, epsilon, etc.) mean exactly?

Why choose fixed learning rate instead of warm-up?

What is weight decay? How does it help with training?

What is the significance of fp16 precision? What are some other precision methods (e.g. bf16, fp32)?

What is DeepSpeed ZeRO Stage2 and what does it do?

As you can see, the focus of this part is not for you to memorize definitions, but to be able to make the core principles clear, and preferably to be able to combine them with practical scenarios to say trade-off.

Summarize

TikTok's VO emphasizes equilibrium:

  • The Coding section is the common classic questions that examine your ability to implement code and your proficiency in thinking.
  • The eight strands section is entirely a deep dive into the details to see if you understand the Transformer, the optimizer, and the training process well.

It's not too tricky overall, but it requires a high level of foundation. If you just memorize the concepts, it's easy to get stuck with follow-up questions.

👉 To successfully pass an interview like TikTok, besides brushing up on the questions, you need to eat up the basics in the paper, especially the Attention, Embedding, Optimizer, Training Tricks These are the pieces that are best prepared with a combination of formulas + practical engineering experience.

with respect to PROGRAMHELP

Our team consists of 7 people, all from top schools & factories: Oxford, Princeton, Peking University, and engineers from Amazon, Google, and Ali. There is no intermediary price increase here, all services are done by us personally.

The services we offer include:

  • OA Writing & Passing Service: HackerRank, Codesignal, Cowboys.com and other platforms tested to ensure 100% pass.
  • VO / Interview Real Time Assist: Personalized voice reminders from seniors + tips on how to think, so you can perform comfortably in your interviews.
  • Interview & full escortFrom resume packaging to interviews, system design assistance, to signing negotiations, we will accompany you until you get a satisfactory Offer.
  • Other customized services: Mock interviews, programming ghostwriting, algorithm tutoring, Quant interview support, and international student admission interviews can be arranged on demand.

We are committed to high efficiency, confidentiality and result-oriented. You can contact us at any time, and we will communicate with you personally and help you solve the problem as soon as 24h, until you get the ideal offer.

author avatar
jor jor
END
 0
Comment(No Comment)