Solution StrataScratch Aggregate Listening Data

COMPANY

SPOTIFY

DIFFICULTY

EASY

ID

10367

TITLE

Aggregate Listening Data

Link to the question on Stratascratch

What you’ll learn

  • DataFrame groupby
  • .agg function
  • .reset_index()
  • nunique

Solution: Aggregate Listening Data

Disclaimer: This code is provided for educational purposes only. Please test the code thoroughly before using it in production. You are responsible for any issues that arise from using this code in your environment.

import pandas as pd

# create a new column to store the seconds in minutes
listening_habits['listen_duration_mints']=round(listening_habits['listen_duration']/60)

#create a new dataframe to store the output values
result2=(listening_habits.groupby('user_id').agg(
        total_listen_duration=pd.NamedAgg(column="listen_duration_mints",aggfunc='sum'), 
        unique_song_count=pd.NamedAgg(column='song_id',aggfunc='nunique'),
        ).reset_index()
        )
        
result2

Code Explanation

Explanation of code elements.

  • In above code, we are grouping the output by user_id and then aggregating it. Also, we are creating two new columns *’total_listen_duration’* and *’unique_song_count’*. For each for of these columns we are applying **NamedAgg** function which takes the column name and the function to apply (aggfunc=’sum’).
  • Pandas aggregation operations like groupby() and agg() will make the grouping column (user_id) into the index of the resulting DataFrame and won’t display it. To display the user_id column, we have to use reset_index().
  • listening_habits.groupby(‘user_id’): This groups the DataFrame by the user_id column.
  • .agg(): This applies the aggregation functions to the grouped data.
  • The NamedAgg function in pandas is used to specify a custom name for an aggregated column when using the agg function. It takes two arguments: column: The name of the column to be aggregated.aggfunc: The aggregation function to be applied.


Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *