There are 4 main music streaming service providers in Hong Kong: Joox, KKBox, Spotify and Moov. And here are some basic data of them, just for your information.

	Joox	KKBox	Spotify	Moov
free trial	yes	7 days	yes	14 days
price/month	38 HKD	58 HKD	58 HKD	48 HKD
student price/month	31.8 HKD	29 HKD	29 HKD	24 HKD
lyrics	yes	yes	no	yes
library size	30 million	40 million	30 million	1 million
downloadable	yes	yes	yes	yes
highest sound quality	1411kbps FLAC	320kbps AAC	320kbps AAC	1411kbps FLAC
music styles and special features	local HK artists	Japanese, Korean and Chinese songs	European and American songs	exclusive playlists and online concerts

(source: https://moneyhero.com.hk, July 2020)

survey

In order to discover which attributes influence the customers’ decision the most in terms of selecting a music streaming service, we conducted a survey on the university students in Hong Kong, aging between 18-24, who may or may not uses music streaming service.

The survey was designed to collect data need for a rate-based conjoint analysis. We designed 12 imaginary music streaming service packages, including the attributes mentioned above, and had our participants rate each package from 1-10 based on their own preferences.

dataset

After successfully collected the response from 60 university students, we cleaned and reorganized the dataset using Excel. Next, we loaded the dataset into jupyter notebook using Pandas, and be ready to conduct the conjoint analysis.

tools & skills

tools: Python, Jupyter Notebook
skills: Excel, data cleaning with Pandas, visualization with Seaborn and Matplotlib

data exploration & cleaning

First of all, I loaded the needed packages for the analysis, including pandas and numpy.

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')
df.head()

Because in the original dataset, 'brand' was a categorical variable, we converted it into dummy variables using get_dummies in pandas.

df[['is_joox', 'is_kkbox', 'is_spotify', 'is_moov']] = pd.get_dummies(df.brand)
df = df.drop('brand', axis=1)
df.head()

Likewise, we converted 'sound_quality' into dummy variables as well. Although the feature is numerical, the values of sound quality are not continous. We had to treat them like categorical variables.

df[['sq_256', 'sq_320', 'sq_1411']] = pd.get_dummies(df.sound_quality)
df = df.drop('sound_quality', axis=1)
df.head()

Lastly, we split the DataFrame into y and X features.

y = df['rate']
X = df[['price', 'library_size', 'with_lyrics', 'downloadable', 'is_joox', 
	'is_kkbox', 'is_spotify', 'is_moov', 'sq_256', 'sq_320', 'sq_1411']]

conjoint analysis

import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('bmh')

res = sm.OLS(y, X, family=sm.families.Binomial()).fit()
res.summary()

df_res = pd.DataFrame({
    'param_name': res.params.keys()
    , 'param_w': res.params.values
    , 'pval': res.pvalues
})
# adding field for absolute of parameters
df_res['abs_param_w'] = np.abs(df_res['param_w'])
# marking field is significant under 95% confidence interval
df_res['is_sig_95'] = (df_res['pval'] < 0.05)
# constructing color naming for each param
df_res['c'] = ['blue' if x else 'red' for x in df_res['is_sig_95']]

# make it sorted by abs of parameter value
df_res = df_res.sort_values(by='abs_param_w', ascending=True)

f, ax = plt.subplots(figsize=(14, 8))
plt.title('Part Worth')
pwu = df_res['param_w']
xbar = np.arange(len(pwu))
plt.barh(xbar, pwu, color=df_res['c'])
plt.yticks(xbar, labels=df_res['param_name'])
plt.show()

conjoint-part-worth

# need to assemble per attribute for every level of that attribute in dicionary
range_per_feature = dict()
for key, coeff in res.params.items():
    sk =  key.split('_')
    feature = sk[0]
    if len(sk) == 1:
        feature = key
    if feature not in range_per_feature:
        range_per_feature[feature] = list()
        
    range_per_feature[feature].append(coeff)

# importance per feature is range of coef in a feature
# while range is simply max(x) - min(x)
importance_per_feature = {
    k: max(v) - min(v) for k, v in range_per_feature.items()
}

# compute relative importance per feature
# or normalized feature importance by dividing 
# sum of importance for all features
total_feature_importance = sum(importance_per_feature.values())
relative_importance_per_feature = {
    k: 100 * round(v/total_feature_importance, 3) for k, v in importance_per_feature.items()
}

alt_data = pd.DataFrame(
    list(importance_per_feature.items()), 
    columns=['attr', 'importance']
).sort_values(by='importance', ascending=False)


f, ax = plt.subplots(figsize=(12, 8))
xbar = np.arange(len(alt_data['attr']))
plt.title('Importance')
plt.barh(xbar, alt_data['importance'])
for i, v in enumerate(alt_data['importance']):
    ax.text(v , i + .25, '{:.2f}'.format(v))
plt.ylabel('attributes')
plt.xlabel('% importance')
plt.yticks(xbar, alt_data['attr'])
plt.show()

conjoint-percentage-importance

alt_data = pd.DataFrame(
    list(relative_importance_per_feature.items()), 
    columns=['attr', 'relative_importance (pct)']
).sort_values(by='relative_importance (pct)', ascending=False)


f, ax = plt.subplots(figsize=(12, 8))
xbar = np.arange(len(alt_data['attr']))
plt.title('Relative importance / Normalized importance')
plt.barh(xbar, alt_data['relative_importance (pct)'])
for i, v in enumerate(alt_data['relative_importance (pct)']):
    ax.text(v , i + .25, '{:.2f}%'.format(v))
plt.ylabel('attributes')
plt.xlabel('% relative importance')
plt.yticks(xbar, alt_data['attr'])
plt.show()

conjoint-relative-importance

conclusion

'sound_quality' and 'brand' are the 2 most influenctial attributes for university students in Hong Kong when choosing a music steaming service. Suprisingly, 'price' is not one of the important attributes. However, there might be other factors like the music styles and special features that are not measurable in this case. We will be able to investigate into those when more detailed data are collected.

*This report is originally the final team project of the Marketing Analytics (MKTG4090) course at CUHK. I have re-organized and re-analyzed data and created this post. Shout out to my teammates Cathy, Terrence and Alex, thanks guys!

Conjoint Analysis on Spotify HK University Student Users

survey

dataset

tools & skills

data exploration & cleaning

conjoint analysis

conclusion

materials:

survey

dataset

tools & skills

data exploration & cleaning

conjoint analysis

conclusion

materials:

related posts: