
Problem description.
Responsibility - So, you have to implement a new view of the analytics: sessions. The workload is a data set of individual web page visits, and this comes with an accompanying analytics workload. this comes with an accompanying visitorId
generated by their tracking cookie which identifies each unique user. The purpose is to create a list of sessions for each visitor from that data.
The raw event data is available at REDACTED via the dataset API
The data set looks like this.
"events": [
{
"url":"/pages/a-big-river",
"visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039",
"timestamp": 1512754583000
}.
{
"url":"/pages/a-small-dog", {
"visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", { "url":"/pages/a-small-dog", "timestamp": 1512754583000 }, {
"timestamp": 1512754631000
}, {
{
"url":"/pages/a-big-talk", {
"visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", {
"timestamp": 1512709065294
}, {
{
"url":"/pages/a-sad-story", {
"visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", {
"timestamp": 1512711000000
}, {
{
"url":"/pages/a-big-river", {
"visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", { "url":"/pages/a-big-river", "timestamp": 15121211000000 }, {
"timestamp": 1512754436000
}, {
{
"url":"/pages/a-sad-story", {
"visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", {
"timestamp": 1512709024000
}
]
}
With the input data, we would like to generate sessions from this incoming data. You have the CSV of all your recorded user activity, defining a session as a You have the CSV of all your recorded user activity, defining a session as a bunch of events from a single visitor with 10 minutes apart between each event.
Therefore for the example input data above, we should get something like.
"sessionsByUser": {
"f877b96c-9969-4abc-bbe2-54b17d030f8b": [
{
"duration": 41294,
"pages": [
"/pages/a-sad-story".
"/pages/a-big-talk"
],, "startTime": [ "startTime": 15127024000
"startTime": 1512709024000
}, { "startTime": 1512709024000
{
"duration": 0, {
"pages": [
"/pages/a-sad-story"
],
"startTime": 1512711000000
}
]
"d1177368-2310-11e8-9e2a-9b860a0d9039": [
{
"duration": 195000,
"pages": [
"/pages/a-big-river".
"/pages/a-big-river".
"/pages/a-small-dog"
], "startTime": 1512
"startTime": 1512754436000
}
]
}
}
After you get your event data, transform it into sessions and POST the result to REDACTED via HTTP
Approach
Our approach was firstly to.
- Map
visitorId
→ list of visits (url, timestamp) in a map. - Fill that map with data from the GET URL.
- Sort each visitor's list by timestamp ascending. then iterate.
- If current session is empty, start a new one and record startTime.
- Else, check the time gap to the previous event.
- ≤ 10 minutes: add to current session.
- > 10 minutes: close current session, start a new one.
- Calculate session fields.
- startTime: timestamp of first event.
- duration: lastEventTime - firstEventTime.
- pages: ordered list of URLs in session.
- Compile sessions for all users into the final JSON and POST to the API.
Reference
The HubSpot coding assessment
HubSpot OA
If you also need our interview assistance services or OA support services, please contact us immediately.