Solution StrataScratch Aggregate Listening Data

Interview Question

You’re tasked with analyzing a Spotify-like dataset that captures user listening habits. For each user, calculate the total listening time and the count of unique songs they’ve listened to. In the database duration values are displayed in seconds. Round the total listening duration to the nearest whole minute. The output should contain three columns: ‘user_id’, ‘total_listen_duration’, and ‘unique_song_count’.

COMPANY

SPOTIFY

DIFFICULTY

EASY

10367

TITLE

Aggregate Listening Data

Link to the question on Stratascratch

What you’ll learn

DataFrame groupby
.agg function
.reset_index()
nunique

Solution: Aggregate Listening Data

Disclaimer: This code is provided for educational purposes only. Please test the code thoroughly before using it in production. You are responsible for any issues that arise from using this code in your environment.

import pandas as pd

# create a new column to store the seconds in minutes
listening_habits['listen_duration_mints']=round(listening_habits['listen_duration']/60)

#create a new dataframe to store the output values
result2=(listening_habits.groupby('user_id').agg(
        total_listen_duration=pd.NamedAgg(column="listen_duration_mints",aggfunc='sum'), 
        unique_song_count=pd.NamedAgg(column='song_id',aggfunc='nunique'),
        ).reset_index()
        )
        
result2

Code Explanation

Explanation of code elements.

In above code, we are grouping the output by user_id and then aggregating it. Also, we are creating two new columns *’total_listen_duration’* and *’unique_song_count’*. For each for of these columns we are applying **NamedAgg** function which takes the column name and the function to apply (aggfunc=’sum’).
Pandas aggregation operations like groupby() and agg() will make the grouping column (user_id) into the index of the resulting DataFrame and won’t display it. To display the user_id column, we have to use reset_index().
listening_habits.groupby(‘user_id’): This groups the DataFrame by the user_id column.
.agg(): This applies the aggregation functions to the grouped data.
The NamedAgg function in pandas is used to specify a custom name for an aggregated column when using the agg function. It takes two arguments: column: The name of the column to be aggregated.aggfunc: The aggregation function to be applied.

All StrataScratch Solutions

Spotify Interview Questions

Solution StrataScratch Aggregate Listening Data

Interview Question

What you’ll learn

Solution: Aggregate Listening Data

Code Explanation

Comments

Leave a Reply Cancel reply