Consumer Behavior Statistics of Mobile Telephone Services F R I D A Å H S L U N D

Similar documents
Strex Code of Conduct (previous: CPA Guidelines)

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Pricing of Mobile Messaging

Updated February 2017

Ch. 6: Understanding and Characterizing the Workload

USING A BATTERY ENERGY STORAGE SYSTEM AND DEMAND RESPONSE CONTROL TO INCREASE WIND POWER PENETRATION IN AN ISLAND POWER SYSTEM

Improving Online Business. Affiliate programs with Adwords

Updated December 2015

To study consumer awareness & perception towards usage of Mobile Banking Prof. Amit P. Wadhe 1 and Prof Shamrao Ghodke 2

Genome 373: Machine Learning I. Doug Fowler

SOCIAL MEDIA MINING. Behavior Analytics

Topic 2 Market Research. Higher Business Management

CLASSIFICATION OF TRAFFIC PATTERN SUMMARY INTRODUCTION

Determining the Factors that Drive Twitter Engagement-Rates

5 WAYS TO MAKE MONEY ONLINE. Wealth Guru Mike

A Decision Support System for Market Segmentation - A Neural Networks Approach

ANVIL MARKETING SERVICES

Q1 Overall, are you satisfied with your experience at DBA University, neither satisfied nor dissatisfied with it, or dissatisfied with it?

Customer analysis of the securities companies based on modified RFM model and simulation by SOM neural network

Section 1: Introduction

RFM-BASED E-MARKETS SEGMENTATION USING SELF- ORGANIZING MAPS

Basic Account. The essential guide to your new account

Keyword Analysis. Section 1: Google Forecast. Example, inc

Blogger Callback Survey Final Revised Topline 7/6/06

Marketing Mobile with Mobile: Lessons in Strategy

Topic 6 - Promotion. N5 Business Management

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

Let us introduce you to Course Match

ARE HAPPENING TO YOUR ONLINE BUSINESS BANKING

KPMG Consumer and Convergence 5 Study Russia Report

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Advertising Rates. Retail and Classified. Effective June 1, dailysentinel.com. lufkindailynews.com

νµθωερτψυιοπασδφγηϕκλζξχϖβνµθωερτ ψυιοπασδφγηϕκλζξχϖβνµθωερτψυιοπα σδφγηϕκλζξχϖβνµθωερτψυιοπασδφγηϕκ χϖβνµθωερτψυιοπασδφγηϕκλζξχϖβνµθ

All Networks operators and providers involved in the provision of premium rate services to consumers.

New Customer Acquisition Strategy

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

Member Marketplace for Small Business A GUIDE TO GETTING STARTED

ELECTRONIC BANKING HB KLIK. User manual for retail/residential customers and corporate clients

Individual Report : Business To Customer

Cash account. Current accounts

Modelling buyer behaviour - 2 Rate-frequency models

Aastra Solidus ecare Multimedia Contact Center customer service at its best

Terminal Appointment Booking System

WORKFLOW AUTOMATION AND PROJECT MANAGEMENT FEATURES

ELECTRONIC BANKING HB KLIK. User manual for retail/residential customers and corporate clients

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Practical Application of Predictive Analytics Michael Porter

About the LSA Annual Meeting. Who Attends?

Sticky Sites LESSON PLAN. Essential Question How do websites attract visitors and keep them there?

The usage of Big Data mechanisms and Artificial Intelligence Methods in modern Omnichannel marketing and sales

Measuring the Benefits to Sniping on ebay: Evidence from a Field Experiment

SUBFINDER SUBSTITUTE PHONE INSTRUCTIONS

WHAT YOU NEED TO KNOW

Research on Influence Factors of Crowdfunding

Power Options. For Oregon Customers. Choosing an Electricity Service Supplier About transition adjustments... 7

CHAPTER 5. CRM in Current Era

GUIDE TO: OPTIMIZING YOUR TO REP

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

Value Added Service Enabling Technologies for Future Networks

The U.S. Digital Video Benchmark 2012 Review. Adobe Digital Index

and not work, so you maintain a high level of interest and energy over time. So, if the company closes for what ever reason, all of those names and

Advertising Rates. Retail and Classified. Effective January 1, dailysentinel.com. lufkindailynews.com

What Makes Google Tick?

Chapter 5 DATA ANALYSIS & INTERPRETATION

Spotlight on Low Income Consumers Final Report

Model Selection, Evaluation, Diagnosis

Chapter 9. Business Intelligence Systems

Gene Expression Data Analysis

Marketing Industriale e Direzione d Impresa Lezione 20 Marketing Plan 2. Ing. Marco Greco Tel

Chapter 1. Introduction

Employment Application

Utility. Commercial Customer Engagement: The Five Analytics Strategies

HOLIDAY PAY. 4. Holiday premium pay Time and a half pay for hours actually worked on a holiday by nonexempt

FI300 FI Customizing: G/L, A/R, A/P

Reading Essentials and Study Guide

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

AT&T Small Business Technology Poll. March 2010

Chapter 3. Integrating AHP, clustering and association rule mining

Advertising Rates. Retail and Classified. Effective July 1, dailysentinel.com. lufkindailynews.com

OroTimesheet User Guide

AN EQUAL OPPORTUNITY EMPLOYER S.D.R.S. C.T. SUB NEMO. Address Street City Zip. Address Phone # Birthday Optional: Single Married Dependents Ages

ECONOMICS 103. Topic 3: Supply, Demand & Equilibrium

Advertising and Other Promotional Tools

The Growing Popularity of Smart Homes and What it Means for Utilities

TANF Policy 11 Contractor Documentation for Work Experience and Other Approved Unpaid Work

Avaya One Touch Video solution

Business mobile marketing. The most effective way to communicate with your customers

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology

Bioinformatics for Biologists

CREATING SUCCESSFUL CAMPAIGNS WITH

CHAPTER 1 INTRODUCTION

ACL ESSENTIALS. Get insight into your ERP process health, compliance & financial exposure TRAVEL & ENTERTAINMENT EXPENSES

VIRGINIA VOICE BROADCAST SCHEDULE. Virginia Voice broadcasts 24 hours per day, seven days per week.

Using SAS Enterprise Guide, SAS Enterprise Miner, and SAS Marketing Automation to Make a Collection Campaign Smarter

GAMER. Interpreting the data. Cross-platform activities Index comparisons

We have developed seven commitments that let you understand what you can and should expect from our products, services and our people.

Business Statistics (BK/IBA) Tutorial 4 Exercises

online rate card 2011

Module 8 The Routine

Chapter 2 Understanding Media in the Digital Age Comm 336 Mass Media Malaise Course

Transcription:

Consumer Behavior Statistics of Mobile Telephone Services F R I D A Å H S L U N D Master of Science Thesis Stockholm, Sweden 2006

Consumer Behavior Statistics of Mobile Telephone Services F R I D A Å H S L U N D Master s Thesis in Computer Science (20 credits) at the School of Engineering Physics Royal Institute of Technology year 2006 Supervisor at CSC was Stefan Arnborg Examiner was Stefan Arnborg TRITA-CSC-E 2006:011 ISRN-KTH/CSC/E--06/011--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.csc.kth.se

Consumer Behavior Statistics of Mobile Telephone Services A Data Mining Project Abstract This thesis looks at how the users of mobile telephone services have behaved historically by exploring the transaction data in the Internet Payment Exchange database. With analyze of variance it was possible to establish what behavior to expect in the future. Also, the content providers were clustered with the unsupervised clustering method selforganizing maps. It can be shown that 61 % of the users use one or two services per month. 77,6 % of the users use services four month per year or less. 55% of the users use only services that are free. 37,4 % of the users that pay for some of their services spend 10 SEK or less per month. 28 % of the users are responsible for 90% of the spending. It was possible to find a cluster of content providers that had more transactions as well as higher spending per user and month, than other content providers. The group had an average of 3,84 transactions, and 51,56 SEK per user and month.

Statistik över konsumentbeteende av mobiltelefontjänster Ett data mining projekt Sammanfattning Detta examensarbeta tittar på hur användandet av mobiltelefon tjänster har sett ut historiskt, genom att utforska Internet Payment Exchange databas. Med variansanalys fastställdes vilket beteende som är att vänta i framtiden. Med hjälp av klusteringmetoden self-organizing maps klustrades företagen som tillhandahåller mobila tjänster. Det visade sig att: 61 % av användarna använder en eller två tjänster i månaden. 77,6 % av användarna använder tjänster fyra månader per år eller mindre. 55% av användarna använder bara gratistjänster. De flesta användarna, 37,4 % av användare av icke gratistjänster, spenderar 10 SEK, eller mindre varje månad. 28 % av användarna står för 90% av spenderingen. Det var möjligt att hitta ett kluster med företag som tillhandahåller mobila tjänster som hade fler transaktioner och högre spendering per användare än övriga företag. Gruppens medel var 3,84 transaktioner och 51,56 SEK per användare och månad.

Acknowledgement This is a thesis for a Master of Science in Engineering Physics with a specialization in Data Mining. The thesis was made for the department of Numerical Analysis and Computer Science at the Royal Institute of Technology in Sweden. This thesis was commissioned by Ericsson, Sweden, with the intention to help Ericsson to understand consumer behavior of mobile telephone services. However, this thesis is probably interesting for anyone interested in consumer behavior. I want to thank my supervisor at Ericsson Magnus Wester, and my supervisor and examiner at KTH professor Stefan Arnborg. I would also thank all the other people at Ericsson that helped me. Finally, I wish to thank my family for all love, and support. Stockholm, Thursday, 16 February 2006

Table of contents 1 Introduction... 1 1.1 Background... 1 1.2 Definition of Mobile Services... 1 1.3 Problem Definition... 2 2 Theory... 3 2.1 Analysis of variance... 3 2.1.1 One-way ANOVA... 3 2.1.1 Two-way ANOVA... 4 2.2 K-means clustering... 4 2.3 Self-Organizing Maps... 5 3 Analysis... 7 3.1 Information stored... 7 3.2 Different types of IPX users...7 3.3 Question that can be answered... 8 3.4 Statistical proof... 9 3.5 Clustering... 10 4 Results... 11 4.1 Services the IPX user uses... 11 4.1.1 The most used technique... 11 4.1.2 The most used content provider... 11 4.1.3 The operator with the most frequent IPX users... 12 4.1.4 Trends... 12 4.2 The time the IPX user uses services... 13 4.2.1 The day of the month the services are used the most... 13 4.2.2 The day of the week the services are used the most... 14 4.2.3 The hour of the day the services are used the most... 14 4.2.4 Trends... 17 4.3 The time when the IPX user is the most willing to spend money... 17 4.3.1 The day of the month with highest spending... 17 4.3.2 The day of the week with highest spending... 18 4.3.3 The hour of the day with highest spending... 19 4.3.4 Trends... 22 4.4 How often the IPX user uses the services... 23 4.4.1 The average number transactions made on average per IPX user and month... 23 4.4.2 Number months a year the IPX user uses services... 24 4.4.3 Number of weeks a year the IPX user uses services... 25 4.4.4 Percentage IPX users that uses how many services per month... 26 4.4.5 Percentage IPX user uses how many content providers per month... 28 4.4.6 Trends... 28 4.5 How much the IPX user is willing to spend... 29 4.5.1 The average spending per IPX user and month... 29 4.5.2 Percentage IPX users spend how much... 32

4.5.3 Percentage IPX users that are responsible for how much of the spending... 33 4.5.4 Trends... 34 4.6 How the price influences how the IPX user buy services... 34 4.6.1 The distribution of transactions among the tariffs... 34 4.6.2 The days of the month low and high tariffs are used... 35 4.6.3 The days of the week low and high tariffs are used... 36 4.6.4 The hours of the day low and high tariffs are used... 36 4.6.5 Trends... 37 4.7 Clustering... 37 4.7.1 Transactions over the hours of the day per content providers... 38 4.7.2 Spending over the hours of the day per content providers... 39 4.7.3 WAP transactions over the hours of the day per content providers... 42 4.7.4 Transactions and spending per IPX user per content providers... 44 4.7.5 Trends... 46 5 Discussion and recommendations... 47 6 Conclusions... 48 References... 50

1 Introduction Most people today have a mobile phone, and many people use them to do more than calling and messaging friends; they use mobile services. Mobile phones get more, and more advanced, and so do the mobile services, but how does the consumer behavior look like for these mobile services? 1.1 Background Ericsson s affiliated company Internet Payment Exchange (IPX) provides a solution that makes it simple for the companies that supply mobile services to deliver and receive payment for their content and services. Such companies are called content providers. IPX serve as mediator between content providers and operators. Instead of being forced to interface their service to all operators, the content providers just have to interface it to IPX, and IPX interfaces it to all operators, see Figure 1. Figure 1: IPX serve as a mediator between content providers and operators. Every time a mobile service supplied by a content provider that uses IPX services is used, the transaction passes the IPX system. Every time the transaction is successfully made, it is recorded in the IPX database. This information has been saved, but it has never been analyzed. The mobile service market is a relatively new market, and it is growing very fast. However, there are also many new content providers to compete for the consumers, so it is very important to know the consumers, what they buy, when they buy it, and what they are willing to pay for. Although, IPX does not sell any mobile services themselves, it is of significant importance for IPX to know what attracts the consumers. It would give them an advantage towards other mediators if they could offer knowledge about consumers behavior to the content providers. 1.2 Definition of Mobile Services The mobile services that this thesis will discuss are services when either the consumer or the content provider is paying for the content. 1

Examples of services are downloading ring tones, get weather information, or using client games. The messaging techniques that will be analyzed here are SMS, MMS, WAP, and web. SMS stands for short messaging service, and is simply a text message. MMS stands for multimedia messaging service, and is when binary content can be attached to the message, such as pictures, audio, or small video clips. WAP stand for wireless application protocol, and it allows the user of the mobile phone to access the Internet for services and information. Web is an IPX terminology, and refers to the services that is sold on the Internet, but is paid using the mobile phone. Premium messages will be distinguished from non-premium messages. Premium messages are when the consumers pay for the content and non-premium services are simply when consumers do not pay for the content. Finally, mobile originated (MO) messages will be distinguished from mobile terminated (MT) messages. MO is when a consumer sends a message to a content provider, and MT is when a message is sent from a content provider to a consumer. Both MO and MT could be premium as well as non-premium messages. 1.3 Problem Definition The transaction information that is stored in the IPX database since several years contains a lot of valuable information, but what kind of conclusions about consumer behavior is possible to draw from it? The benefits of analyzing the data in the database are many. The analysis could give content providers information on where and when it would be the most profitable to have their commercial showing. The analysis could reveal if it is more important to have many consumers spending little using the services, or if it is enough to have a few that spends a lot. Also, the analysis could reveal when the consumers are willing to spend money, and when they are not. Maybe the content provider actually could make money on lowering the prices at certain times because then the consumer would spend more. The aim of this thesis is to historically look at how the users of IPX services have behaved, and what of this behavior to expect in the future. So, from now on in this thesis the consumers of mobile services will be called IPX users. IPX is the leading mobile payment mediator on the Swedish market with about 30% of the market, and is also active in 20 other countries around the world. However, this thesis will only analyze the consumer behavior within the Swedish market. Different markets sometimes have different consumer behavior, but if only looking at one market the results are applicable to that entire market. The Swedish market is chosen because of IPX large market share, this makes the statistics more reliable. 2

2 Theory The theory consists of three parts; the first part is the analysis of variance, which is used to do the statistics. The second part, k-means and the third part, self-organizing maps are two ways to do unsupervised clustering. 2.1 Analysis of variance Analysis of variance (ANOVA) is a way to determine if an occurrence is a coincidence, or something to be expected. ANOVA is used when there are samples that are drawn under different conditions. If there is one factor (set of related conditions or categories) to vary on different levels, then there is a one-way ANOVA, if there are two factors to vary then there is a two-way ANOVA, and so on. 2.1.1 One-way ANOVA For the one-way ANOVA, if the conditions are systematic, that is, the question of issue concerns the level of the conditions, then the model for the one-way ANOVA is: Y ij =θ i +ε ij,, ε ij N(0,σ 2 ) and independent (1) i=1,2...k, j=1,2...n i and N= n i i The null hypothesis for the one-way ANOVA is H 0 : No difference between θ i The F statistics are calculated as follows: σ i F = (2) σ 1 2 σ i = n j ( Yi. Y.. ) (3) ( k 1) i 1 2 σ = ( Y ij Y i.) (4) ( N k) i j σ i is the variance between groups and it reflects not only the difference between the means of the groups, but also sampling error. σ is the variance within groups, the average of the variance of the groups, and does not reflect differences between the means of the groups. If there is a considerable difference between the means of the groups, then σ i will become large and so will F. However, if there is no difference between the means of the groups σ i as well as σ will only reflect sampling error with values close to each other, which would give an F value close to one. This leads to the conclusion that a high value of F will reject H 0. F is F-distributed. The one-way ANOVA requires normal distribution, as well as homogeneity of the variance within the groups. 3

2.1.1 Two-way ANOVA The two-way ANOVA is calculated in a similar way to the one-way ANOVA. The model for the two-way systematic ANOVA is: Y ij =μ+α i,+β j +ε ij i=1,2...r, j=1,2...s, ε ij N(0,σ 2 ) and independent (5) Here there are two null hypotheses: H 0i : No difference between α i H 0j : No difference between β j The F statistics are calculate in a similar way as above: σ i F i = (6) σ σ j F j = σ (7) 1 2 σ i = s ( Yi. Y.. ) ( r 1) i (8) 1 2 σ j = r ( Y. j Y.. ) ( s 1) j (9) 1 2 σ = ( Yij Yi. Y. j + Y.. ) ( r 1)( s 1) i j (10) where σ i is the variance between groups i, σ j is the variance between groups j, and σ is the variance within groups. As for the one-way ANOVA a high value of F will lead to that the corresponding H will be rejected and F is F-distributed. As for the one-way ANOVA, the two-way ANOVA requires normal distribution, as well as homogeneity of the variance within the groups. 2.2 K-means clustering K-means clustering is a clustering algorithm where K stands for a prespecified number of clusters, which means that the K-means always provide us with K clusters. K-means clustering is a very general clustering algorithm that can be used for a variety of applications and data types. K-means is a prototype based clustering technique, which is when the objects in a cluster are closer to the prototype that defines the cluster than any other prototype. The prototype for the K-means is the centroid, which can be calculated using a distance metric between data points that can be Euclidean distance. The K-means algorithm is as follows: 1) The K centers of the clusters are chosen. 2) The data points are assigned to the closest center of a cluster, based on the distance metric chosen. 3) The cluster s centre is recomputed as the mean of the data point associated with it. 4) Repeat from step 2 until none of the data point change cluster. 4

The number of cluster centers, K, can be chosen based on Davies- Bouldin index. The index is calculated by dividing the sum of scatter within a cluster with the sum of scatter between clusters. More specifically, let: Q i, be the centre of clusters i=1,2.k, then Davies Bouldin index is: 1 S n ( Qi ) + S n ( Q j ) DB = (11) n i= 1 S( Qi, Q j ) where S n ( Qi ) is the distance from the data points to their cluster centre Q i, and S( Q i, Q j ) is the distance between cluster centers Qi and Q. j 2.3 Self-Organizing Maps The Self Organizing Map (SOM) that are discussed in this paper are based on Teuvo Kohonen s theories that provide a way to represent multidimensional data in much lower dimensions, (Vesanto, 2000). The SOMs are unlike other neural networks designed to classify data without supervision. The SOM creates a network that stores information of the training sets. It consists of neurons that are connected to adjacent neurons with similar properties in a topological way. Each neuron i has a d-dimensional prototype vector m i =[m i1,.m id ] associated with it, i.e. there is one neuron associated with each centroid. In each step a random vector x is chosen from the training set and the distance to the prototype vector is calculated, to find the best matching unit (BMU). SOM resembles prototype based clustering techniques like K-means, with the difference that both the BMU, as well as topological neighbors on the map, are updated in the training of the map. The SOM is trained iteratively as follows: 1) The weights for all the neurons are initialized. 2) One sample vector, x, is chosen at random from the input data. The BMU neuron closest to the input vector x are searched for among all neurons m, often using the Euclidean distance, but the dot product metric is also used fairly often. 3) The radius around the BMU is calculated. This value starts large, and then decreases for every iteration, and can be calculated for example as follows: t σ = σ 0 exp t=1,2,3, (12) λ where σ 0 is the radius at t 0, and λ is a time constant. 4) The weight vectors of the neurons in the neighborhood of the BMU (even the BMU itself) are then adjusted, the closer, the more the weights are altered, see Figure 2. The weight, as the radius, are altered less for every iteration. Therefore, the update of the prototype vector is: m i ( t +1) = m i (t) + α(t) h bi (t)[ x m (t) ], t=1,2,3, (13) i 5

where the learning rate α(t) can for example be: t α( t) = α 0 exp(, t=1,2,3, (14) λ and h bi (t) can be the Gaussian kernel: 2 rb ri h = bi ( t) exp, t=1,2,3, (15) 2 2σ where σ is as above and α 0 and λ are constants. 5) Repeat from step 2 until convergence, if convergence occurs, otherwise to N, a predetermined number of times. Figure 2: The BMU and adjacent neurons move closer to the input vector x,(vesanto, 2000). The SOM can be visualized in many ways; one way is the U-matrix, which is a distance matrix that shows distances between map units and their neighbors. Although, the SOM provides a convenient way of representing multidimensional data, it might not always provide easily detected clusters. 6

3 Analysis There are infinitely many questions concerning consumer behavior of mobile services that could be asked, but only a limited number of them are possible and interesting to answer with the data from the IPX database. Therefore, the approach to find valuable statistics about consumer behavior is to first explore the data that is stored in the database today and what question it could answer, and second, to do data mining and statistics on that. 3.1 Information stored The IPX database is a large database with a lot of information stored, but the information is stored for the limited purpose of providing accurate billing information. However, some of the information stored is also relevant to the exploring of consumer behavior. The data stored is: Phone number Date and time of purchase Type of message (MT SMS, MO SMS, MT MMS, WAP, web) Operator Content provider. Keyword. The content provider has different keywords for different services. Tariff of the message in domestic currency, here in Swedish kronor (SEK): The market where the transaction was made, which is Sweden in this case. Information about whether the transaction was successful, or not. 3.2 Different types of IPX users From the information stored in the database it is not possible to draw any conclusions about personal information as age, sex, or income about the IPX user. Instead, we can divide the IPX users into other segments based on the information that is stored in the database. The segments are IPX users using different: Operators Techniques Categories The operators are all the mobile operators that exist on the Swedish market. The techniques are MT SMS, MO SMS, MT MMS, WAP, and web. The categories are based on the content providers. Since many of the content provider works in many areas, this division in categories is not the best possible. It would have been better if the categories had been based on keywords, since every keyword is a 7

specific service. The reason why it was not possible to do this division is because IPX does not have the information about what specific service a specific keyword is. However, the categories are: Media A service that is based on content from newspapers, news agencies, magazines and TV networks. Sport news, sport results, and sport commentaries are also included in this category. Community A service that includes communication between people using voices, video, images, as well as text. Examples are diaries, dating, and chatting. Enterprise A service that provides tools for offices and business. Examples are handling of the answering machine, and the ability to communicate within the business or with customer using the mobile phone. Entertainment A service with the purpose to entertain, for example comic strip, horoscope, fashion, art, film, books, video clip, real music, background pictures, screensavers, betting, games to download, and client games. Information A service that provides information of different kinds, that is not news. Examples are traffic information, weather information, directory assistance, and stock-exchange quotation. 3.3 Question that can be answered What is possible to say about consumer behavior from the information in the IPX database? In other words: Who is the IPX user? Based on the information stored in the database, these are the questions that this thesis is going to answer about who the IPX user is: Services the IPX user uses. What technique is the most used one? What content provider category is the most used one? What operator has the most frequent IPX users? The time the IPX user uses services. What day of the month are the services used the most? What day of the week are the services used the most? What hour of the day are the services used the most? The time the IPX user is the most willing to spend money. What day of the month has the highest spending What day of the week has the highest spending? What hour of the day has the highest spending? 8

How often the IPX user uses the services. How many transactions are made on average per IPX user, per month? How many months a year does the IPX user use services? How many weeks a year does the IPX user use services? How many IPX users use how many services, per month? How many IPX users use how many content providers per month? How much the IPX user is willing to spend. What is the average spending per IPX user per month? How many IPX users spend how much? How many IPX users are responsible for how much of the spending? How the price influences how the IPX user buy services. How is the distribution of the transactions among the tariffs? What days of the month are low versus high tariffs used? What days of the week are low versus high tariffs used? What hours of the day are low versus high tariffs used? These questions are then explored for the segments operators, techniques, and categories that were established in section 3.2. Most of these questions are easy to examining with SQL queries in the IPX database. These questions are explored either over the six months from April through September 2005 or over the year from October 2004 through September 2005. The queries give the history of the behavior of the IPX users. 3.4 Statistical proof It is well known that the history is not always the same as the future. When it is established how the IPX users have behaved in the past, what is to expect in the future? The statistical analysis of the data is based on the assumption that the data is normally distributed. However, many of the groups of samples had large variations, which made the data hard to analyze. In many of the cases some samples had to be removed before a correct statistical analysis was possible. In some cases there were so many large variations that it was not possible to remove all of them. The one-way ANOVA was used to analyze the data as well as the two-way ANOVA. For example, the one-way ANOVA was used when the number of transactions during the days of the week was examined. Then the days were the factor to be varied on different levels (Monday through Sunday). The two-way ANOVA, on the other hand, was used when the number of transactions during the days of the week was examined for different segments for example between the operators. Then the days were one factor and the operators the other factor to be varied on different levels. 9

Since there is a large difference in official market shares for the operators, the operators were weighted with their official market shares. There is also a large difference in number of transactions between the techniques, so here a weight was used related to how large percentage out of the total transactions the corresponding technique had, the year of interest. The same weighting was done for the categories, i.e. each category was weighted with the percentage it had out of all transactions, the year of interest. The statistical software SPSS 12.0 was used to do the ANOVA. 3.5 Clustering Except the apparent questions that could be asked, what hidden pattern could be found using data mining, and what interesting conclusion can be drawn from that? In this thesis the content providers are going to be clustered to see if it is possible to find content providers that have customers with the same behavior when looking at: Transactions over the hours of the day per content providers. Spending over the hours of the day per content providers. WAP transactions over the hours of the day per content providers. Transactions and spending per IPX user per content providers. The first three questions explore whether there are certain content providers that have different time patterns, for example more spending in the evening than other content providers. The last question is to see if it is possible to find groups of content providers that have about the same spending level. For the first three questions the content providers are used as the sample, while the hour are used as the variable. For the last question the content providers are still the samples, but with only two variables, the number transactions per user and the spending per user. These questions, and what cluster that can be found are then explored using the SOM algorithm and K-means in Matlab; where K is determined by the Davies-Bouldin index. 10

4 Results Since IPX has not specialized on one type of service they have a representative sample of the Swedish market. Because of that and the fact that IPX has 30% of the market in Sweden, the results of this thesis are not only applicable to the IPX users but also with high statistical significance to all the users of mobile telephone services in Sweden. All the IPX transactions have been data points for the results, and since IPX are not willing to share that information in this thesis, the results are going to be in relative numbers. The results are based on the data from October 2004 through September 2005, except when indicated in the results. Most of the results showed F to be significant beyond the 0,01 level, i.e. p<0.005. Whenever another p level is used, it is indicated in the results. 4.1 Services the IPX user uses 4.1.1 The most used technique When comparing the average number of transactions per month, over a year, for the techniques, MT SMS and MO SMS are dominating, with 61,36% respectively 37,93% of IPX transactions, see Chart 1. 70% 60% 61,36% 50% Percentage transactions 40% 30% 20% 37,93% 10% 0% 0,19% 0,51% 0,02% MT SMS MO SMS MMS WAP WEB Techniques Chart 1: Distribution of the transactions over the techniques. 4.1.2 The most used content provider If comparing the average number of transactions per month, over a year, for the categories, information is dominating with 75,04% of the transactions, see Chart 2. 11

80% 75,04% 70% 60% Percentage transactions 50% 40% 30% 20% 14,00% 10% 0% 8,41% 1,56% 0,99% Media Community Enterprise Entertainment Information Categories Chart 2: Distribution of the transactions over the categories. 4.1.3 The operator with the most frequent IPX users To compare the transactions made per month for the operators, only MO SMS was compared, since some of the operators do not support MMS and WAP. In Chart 3, weighted with market shares in Sweden, it is seen that Operator 2, Operator 5, and Operator 3 have more, Operator 1 and Operator 4 has less transactions made on average per month, over a year. 25% 22,18% 21,93% 22,16% Percentage transactions 20% 15% 10% 16,08% 17,65% 5% 0% Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 Operators Chart 3: Distribution of the transactions over the operators, weighted with market shares. 4.1.4 Trends The IPX users mostly use MT SMS, but also many MO SMS messages. The most used category is information, and operators with the most frequent IPX users are Operator 2, Operator 5, and 12

Operator 3. 4.2 The time the IPX user uses services 4.2.1 The day of the month the services are used the most In the past there has been large variations in the number of transactions during the days of the month, as seen in Chart 4. To explore if the variations is expected or random the one-way ANOVA is used based on the transactions made over the year, with the days as the factor. Then i=1,2 k, where k=31and j=1,2.. n i, where n 1 -n 28 = 12, n 29 -n 30 = 11 and n 31 =7, which gives N=28*12+2*11+7=365. Because of the large variations σ and σ get large, more specifically: 1 ( k 1) i 1 31 1 2 10 8 σ i = n j ( Yi. Y.. ) = 2,98*10 = 9,95* 10 (16) 1 1 = i. 10 ( N k) 365 31 σ 2 12 9 ( Y ij Y ) = 1,05*10 = 2,86* (17) Then F is given by: 8 σ 9,95*10 F = i = = 0,35 σ 9 (18) 2,85*10 Because of the small F, H 0 cannot be rejected. The conclusion is that it is not possible to say anything statistically about when the variations during the days of the month are to be expected. For techniques a two-way ANOVA was done in a similar way, where the days was one factor and the techniques was the other factor, but this did not show any differences neither between days nor between techniques. Also, the two-way ANOVA for categories and operators did not show any differences. All the following results will be based on ANOVA tests like the one above, but only the outcome of the test will be given. 8% 7% 6% Percentage transactions 5% 4% 3% 2% 1% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Day of the month Chart 4: Percentage transactions made during the days of the month. 13

4.2.2 The day of the week the services are used the most When looking at the distribution of the transactions, during the days of the week, fewer transactions were made on average on the weekends than weekdays, see Chart 5. This difference was possible to show statistically. There is no difference between techniques, categories, or operators. However, the difference between weekdays and weekends cannot be statistically shown for the technique WAP, or the category entertainment. 18% 16% 14,47% 14,51% 15,17% 14,99% 15,71% Percentage transactions 14% 12% 10% 8% 6% 12,81% 12,33% 4% 2% 0% Monday Tuesday Wednesday Thursday Friday Saturday Sunday Day of the week Chart 5: Percentage transactions made during the days of the week. 4.2.3 The hour of the day the services are used the most The distribution of the transactions during the hours of the day is shown in Chart 6, which is based on the transactions made April through September 2005. As seen, the variations between consecutive hours are small, and since the variations within the hours is large, it is impossible to prove statistically that there is a difference between adjacent hours, for an example, between 13 and 14. If the day instead of hours are split into four hours segments, 02-05, 06-09, 10-13, 14-17, 18-21, and 22-01, the ANOVA shows that they are all statistically different, except 22-01 that is not statistically different than 06-09 and 10-13 that is not statistically different than 14-17. 14

8% 7% 6% Percentage transactions 5% 4% 3% 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 6: Percentage transactions made during the hours of the day. It is possible to statistically show that there is a difference between the techniques, and also between enterprise, and the other categories, but not between the operators. Therefore, in Chart 7 through 11 the techniques, and categories behavior during the hours of the day are shown, based on the transactions made April through September 2005. In Chart 8 of the WAP transactions over the day, an interesting peak are found around the time 21-23, which will be explored in chapter 4.7.3. 9% 8% 7% Percentage transactions 6% 5% 4% 3% MO SMS MT SMS 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 7: Percentage MO SMS and MT SMS transactions made during the hours of the day. 15

8% 7% 6% Percentage transactions 5% 4% 3% 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 8: Percentage WAP transactions made during the hours of the day. 20% 18% 16% Percentage transactions 14% 12% 10% 8% 6% 4% 2% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 9: Percentage MT MMS transactions made during the hours of the day. 8% 7% 6% Percentage transactions 5% 4% 3% 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 10: Percentage web transactions made during the hours of the day. 16

20% 18% 16% Percentage transactions 14% 12% 10% 8% 6% Entertainment Information Enterprise Community Media 4% 2% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour a day Chart 11:Percentage transactions made for the categories during the hours of the day. 4.2.4 Trends All IPX users use about the same number of transactions each day of the month, no matter if looking at IPX users using different techniques, categories, or operators. There are generally less transactions made on the weekends, except for the technique WAP and the category entertainment, which seems to be used as much on weekends than weekdays. The distribution during the hours of the day is different for the techniques and different for enterprise than other categories, but the same for all operators. 4.3 The time when the IPX user is the most willing to spend money 4.3.1 The day of the month with highest spending It can be shown that the spending is higher between the dates 25-31 and 1-3, than the rest of the days, of the month. This is also possible to see in Chart 12 for the average spending during the days of the month, over a year. However, it is not possible to show statistically any difference on significance level p=0,01 between techniques, between categories, or between operators. 17

12% 10% Percentage spending 8% 6% 4% 2% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Day of the month Chart 12: Percentage spending during the days of the days of the month. 4.3.2 The day of the week with highest spending In the past, the average spending during the days of the week have been about the same for all the days of the week, see Chart 13. Therefore, it is not possible to show any statistically difference between any of the day. 18% 16% 14% 14,10% 13,29% 13,40% 14,46% 14,20% 16,05% 14,51% Percentage spending 12% 10% 8% 6% 4% 2% 0% Monday Tuesday Wednesday Thursday Friday Saturday Sunday Day of the week Chart 13: Percentage spending during the days of the days of the week. Recall that fewer transactions were made on the weekends, and since there is no decrease in spending on the weekends, this indicates that there are more expensive services used on the weekends. It is not possible, on significance level p=0,05, to show any difference between techniques, categories, or operators. However, it is possible for enterprise to show statistically that less spending is expected on weekends, but also that higher spending is expected on Wednesdays, Thursdays, and Fridays, than Mondays and Tuesdays. For Media, highest spending is to be expected on Mondays and least spending on the weekends. It can also be shown that MT MMS as well as web has less spending on the weekends. 18

4.3.3 The hour of the day with highest spending As for transactions, the spending during the hours of the day has too small variations to statistically prove any difference between adjacent hours, see Chart 14, which is based on the transactions made April through September 2005. If, as for the transactions, ANOVA is done on the four hours segments, 02-05, 06-09, 10-13, 14-17, 18-21, 22-01, they are all statistically different except 14-17 and 18-21. In Chart 14 the distribution of transactions is included for reference. Also, the difference in the spending and the transactions during the hours of the day can be statistically shown if the day is divided into eight hour segments, 00-07, 08-15, and 16-23. There are more transactions and less spending made between 08-15, and less transactions and higher spending between 16-23. This indicates that there are less expensive services used 08-15 and more expensive services used between 16-23. This can also be seen in Chart 15 that shows the spending per transaction over the hours of the day. 8% 7% 6% 5% Percentage 4% 3% Transactions Spending 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 14: Percentage spending and transactions during hours of the day. 3,5 3 Spending per transaction 2,5 2 1,5 1 0,5 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 15: Spending per transaction during the hours of the day. 19

As for transactions, the difference in the spending during the hours of the day between different techniques and categories is large. It is possible to show on significance level p=0,05 that the difference is expected between techniques, and between enterprise and other categories. In Chart 16 through 21 the different techniques, and categories spending during the hours of the day are shown, the corresponding transactions are included for reference. The charts are based on the transactions made April through September 2005. 9% 8% 7% 6% Percentage 5% 4% Spending Transactions 3% 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 16: Percentage MT SMS spending and transactions during the hours of the day. 10% 9% 8% 7% Percentage 6% 5% 4% Spending Transactions 3% 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 17: Percentage MO SMS spending and transactions during the hours of the day. 20

50% 45% 40% 35% Percentage 30% 25% 20% Spending Transactions 15% 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 18: Percentage MT MMS spending and transactions during the hours of the day. 8% 7% 6% 5% Percentage 4% 3% Spending Transactions 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 19: Percentage WAP spending and transactions during the hours of the day. 21

8% 7% 6% Percentage 5% 4% 3% Spending Transactions 2% 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 20: Percentage web spending and transactions during the hours of the day. 25,00% 20,00% Percentage 15,00% 10,00% Media Community Enterprise Information Entertainment 5,00% 0,00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour a day Chart 21: Percentage spending for the categories during the hours of the day. 4.3.4 Trends There is higher spending in the beginning and the end of the month, than the rest of the month. There is no difference in the spending during the days of the week, and since there were fewer transactions made on the weekends, this indicates that more expensive services are used on the weekends. There are less expensive services used 08-15 and more expensive services used 16-23. The distribution during the hours of the day is different for all techniques and for enterprise than other categories, but the same for all operators. 22

4.4 How often the IPX user uses the services 4.4.1 The average number transactions made on average per IPX user and month The number of transactions made per month and IPX user using corresponding techniques is shown in Chart 22. That is, the number MT SMS transactions made are divided with the IPX users using MT SMS, and so on for all the techniques. It can be shown statistically that the average number transactions per IPX user for WAP and web are different than MT SMS, MO SMS, and MT MMS, but the difference between any other techniques is not to be expected. 3,50 3,00 2,81 2,89 2,87 2,50 Transactions per user 2,00 1,50 1,46 1,18 1,00 0,50 0,00 MT SMS MO SMS MT MMS WAP WEB Techniques Chart 22: Transactions per month and IPX users using different techniques (MT SMS transactions per MT SMS user, etc). The average number transaction per IPX user using corresponding category is shown in Chart 23. For categories the only difference that can be statistically shown is the difference between communityenterprise, community-entertainment, community-information, and enterprise-media. For operators, it is only possible to statistically show that Operator 3 has more transactions made per IPX user than Operator 4 and Operator 5, see Chart 24. 23

8,00 7,00 6,73 6,00 Number transactions per user 5,00 4,00 3,00 3,41 2,11 4,99 4,94 2,00 1,00 0,00 Media Community Enterprise Information Entertainment Categories Chart 23: Transactions per month and IPX users using different categories (media transactions per media user, etc). 3,5 3,06 3 2,70 2,87 Number transactions per user 2,5 2 1,5 1 2,49 2,30 0,5 0 Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 Operator Chart 24: Transactions per month and IPX users using different operators (Operator 1 s transactions per Operator s1 user, etc). 4.4.2 Number months a year the IPX user uses services When looking at how many months a year the IPX user uses the services, it is seen in Chart 25 that 38,0% of the IPX users use the services one month a year, 19,6% two months a year, 11,4 % three months a year, and 8,5% four months a year. Put in another way 77,5 % use the services seldom (1-4 months per year), while 4,3 % use the services every month (9-12 months per year), see table 1. The average IPX user uses services 2,6 months a year, but since only months are used as the interval to measure this, it might be more interesting to know that the median IPX user uses services 2 months a year. 24

40% 35% 38,0% 30% Percentage IPX users 25% 20% 15% 19,6% 11,4% 10% 8,5% 7,9% 5% 4,6% 3,3% 2,4% 1,5% 1,1% 0,8% 0,8% 0% 1 2 3 4 5 6 7 8 9 10 11 12 Number of months Chart 25: Number months per year the IPX users use services. Compare this to the average 2,6 months per year or the median 2 months per year. How often services are used Every month (9-12 months a year): 4,3% Every second month (5-8 months a year) 18,2% Seldom (1-4 months a year) 77,5% Table 1: How often the services are used by the IPX users. 4.4.3 Number of weeks a year the IPX user uses services Looking at how many weeks a year the IPX user uses the services 34,7% use the services for one week and hardly anyone uses the services more then 10 weeks a year, see Chart 26. The average IPX user uses services 4,5 weeks a year and the median IPX user uses service 2 weeks a year. The fact that more user uses the services one month than one week implies that the ones that uses the services for one month uses the services more than once that month. Also, the fact that more IPX user uses the service 1-4 weeks than one month implies that the user that uses services 2-4 weeks a year, use the services over a longer time period than a month. 25

100,00% 10,00% 34,7% 16,8% 6,9% 3,3% Percentage IPX users 1,00% 0,10% 1,4% 0,54% 0,31% 0,17% 0,10% 0,062% 0,039% 0,028% 0,019% 0,01% 0,008% 0,00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 Number of weeks Chart 26: Number weeks per year the IPX users use services. Compare this to the average 4,5 weeks per year or the median 2 weeks per year. 4.4.4 Percentage IPX users that uses how many services per month In Chart 27 it is seen that 30,4% of the IPX users use one and 30,8% use two services per month. Chart 27 shows that every even number of services used has more IPX users than the closest odd number. This is also seen in Chart 28 as a jitter and it is explained by transactions used in pairs. Chart 29 for the MO and MT SMS transactions do not have this jitter but it is known that many MT SMS transaction are used together with a MO SMS transaction. Chart 28 also tells us that the number of IPX user per number of services used per month decreases with the logarithm. The average number transactions per month are 6,5 per IPX user, while the median number transactions per month are 2 per IPX user. 35% 30% 30,41% 30,79% Percentage IPX users per month 25% 20% 15% 10% 5% 6,04% 10,58% 2,56% 5,24% 1,38% 2,98% 0,85% 1,85% 7,32% 0% 1 2 3 4 5 6 7 8 9 10 >10 Number services Chart 27: Number of times per month the IPX users use services. Compare this to the average 6,5 times or the median 2 times per IPX user and month. 26

100,00% 30,4% Percentage IPX users per month 10,00% 1,00% 0,10% 0,01% 10,6% 5,2% 6,0% 3,0% 1,8% 1,2% 2,6% 1,4% 0,60% 0,33% 0,56% 0,20% 0,28% 0,13% 0,16% 0,085% 0,094% 0,060%0,043% 0,060% 0,042% 0,029% 0,019% 0,030% 0,021% 0,013% 0,010% 0,013% 0,009% 0,007% 0,004% 0,006% 0,004% 0,004% 0,003% 0,00% 1 4 7 0,003% 0,002% 0,002% 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 Number Services Chart 28: Number of times per month the IPX users use services. Compare this to the average 6,5 times or the median 2 times per IPX user and month. 60% 50% 52,6% 49,4% Percentage IPX users per month 40% 30% 20% 21,7% 20,1% MO SM MT SM 10% 9,6% 9,0% 5,9% 5,3% 3,4% 3,4% 2,4% 2,2% 1,6% 1,6% 1,2% 1,1% 0,8% 0,9% 0,7% 0,7% 3,2% 3,3% 0% 1 2 3 4 5 6 7 8 9 10 >10 Number of services Chart 29: Number of times per month, IPX users that uses MO SMS and MT SMS use services. It can be shown that there is a statistical difference between SMS, WAP, MT MMS, and web. However, the difference between MT and MO SMS in Chart 29 is not possible to establish statistically. Table 2 shows the history of number of IPX users of WAP, MT MMS, and web using one or more services. MT MMS has more users using one service than web, which has more users than WAP. MT MMS WAP Web 1 time per month. 94,5% 79,1% 89,5% 2 or more times per month. 5,5% 20,9% 10,5% Table 2: Percentage IPX users using services one time per month, and two or more times for MT MMS, WAP and web users. 27

The fact that MT MMS has the most users using just one service, and as seen earlier in this section most transactions per user, indicates that the user that uses more than one service uses many more services for MT MMS than in other techniques. Finally, it is not possible to show any statistically difference on significance level p=0,05 between categories or between operators. 4.4.5 Percentage IPX user uses how many content providers per month When comparing number of used content providers per month, as many as 83,9% of the IPX users have only used one content provider in average per month, and as few as 0,08% have used 5 content providers or more, see Chart 30. The fact that more IPX users use only one content provider than one service per month implies that the IPX users that uses more services than one, uses it from the same content provider. 90% 83,91% 80% Percentage IPX users per month 70% 60% 50% 40% 30% 20% 10% 0% 13,36% 2,27% 0,38% 0,08% 1 2 3 4 >=5 Number of content providers Chart 30: Number of content providers used per IPX user and month. 4.4.6 Trends MT SMS, MO SMS, and MT MMS are the techniques that have most transactions made per IPX user, enterprise is the category and Operator 3 is the operator with the most transactions per IPX user. Most IPX users use the services seldom, that is less than 4 months a year, or less than three weeks a year. The MT MMS user either uses just one, or very many services per month. Most IPX users use just one content provider per month, even the IPX users that uses more than one service per month most likely use it from the same content provider. 28

4.5 How much the IPX user is willing to spend 4.5.1 The average spending per IPX user and month The average spending per IPX user using corresponding technique, category or operator are all different, as it was for average number transactions per month. It can be shown that there is a difference between the techniques except between MT SMS and MO SMS, see Chart 31, and between categories, see Chart 32. For operators it is only possible to statistically show that Operator 2 has higher spending than the other operators, see Chart 33. 40,00 35,00 35,86 Spending per month (SEK) 30,00 25,00 20,00 15,00 10,00 8,57 8,95 18,31 5,00 2,88 0,00 MT SM MO SM MT MMS WAP WEB Techniques Chart 31: Spending per IPX user and month for the techniques. 35 30 31,00 Spending per month (SEK) 25 20 15 10 14,10 25,12 6,28 5 0 0,00 Media Community Enterprise Information Entertainment Categories Chart 32: Spending per IPX user and month for the categories. 29

16,00 14,00 14,21 12,00 Spending per month (SEK) 10,00 8,00 6,00 8,49 7,56 8,31 4,00 2,50 2,00 0,00 Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 Operators Chart 33: Spending per IPX user and month for the operators. To explore what the differences in spending per IPX user is due to, the spending per transaction is shown to in Chart 34 through 36. If comparing these charts with Charts 31 through 33 it can be seen that they are highly correlated. Therefore, the techniques, categories, and operators with IPX users with high spending levels most likely use more expensive services, and not necessarily more services per IPX user. The differences between techniques and between categories were expected, since the tariffs are varying with the service. However, the differences between operators were not expected. To explore this further, the operators are shown in Chart 37 with corresponding distribution of the categories based on number transactions made. Even though Operator 2 has the highest spending per IPX user, it does not have the most community users, which has the highest spending per IPX user. Also, Operator 2 does not have the least enterprise transaction, which has the least spending per IPX user. This, and some further studying of Chart 37 leads to the conclusion, that the different spending levels of the operators are independent of the categories. 30

30 27,76 Spending per transaction (SEK) 25 20 15 10 12,45 5 3,02 3,08 0,93 0 MT SM MO SM MT MMS WAP WEB Techniques Chart 34: Spending per transaction and month for the techniques. 16 14,47 14 Spending per transaction (SEK) 12 10 8 6 4 4,34 5,16 2 0 1,26 0,00 Media Community Enterprise Information Entertainment Categories Chart 35: Spending per transaction and month for the categories. 31

6 5 4,86 Spending per transaction (SEK) 4 3 2 3,10 2,46 3,35 1 0,97 0 Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 Operators Chart 36: Spending per transaction and month for the operators. 100% 90% 3,35% 3,95% 2,58% 3,27% 2,80% 15,24% 12,32% 11,90% 14,67% 9,63% 80% 70% Percentage transactions 60% 50% 40% 61,13% 68,11% 76,56% 67,80% 75,26% Remaining Entertainment Information Enterprise Community Media 30% 20% 10% 0% 7,03% 3,68% 2,97% 1,88% 11,33% 10,44% 3,11% 9,44% 8,98% 4,71% 1,93% 1,50% 1,13% 1,85% 1,44% Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 Operators Chart 37: The distribution of categories within the operators, based on number transactions. 4.5.2 Percentage IPX users spend how much Looking at how many IPX users that spend how much, 55 % of the IPX users use only non-premium services. Chart 38 shows only the premium users out of which 37,4% spend 0-10 SEK per month, 26,1% spend 10-20 SEK per month, and 18,2% spend 20-30 SEK per month. If comparing Chart 38 with the number of times per month services is used in Chart 27 it is seen that the percentage does not drop as fast in Chart 38. This implies that the user that uses one or just a few services uses more expensive services than 0-10 SEK for example. Over the same period the average spending is 14,63 SEK per month, and the median spending interval is 10-20 SEK per month. 32

40% 37,4% 35% Percentage IPX premium users per month 30% 25% 20% 15% 10% 5% 26,1% 18,2% 4,86% 4,57% 3,00% 1,18% 1,00% 0,95% 0,49% 0,34% 0,40% 0% 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-110 110-120 Spending intervals Chart 38: Spending intervals for IPX premium users per month. Compare this to the average spending 14,63 SEK per IPX user or the median interval 10-20 SEK. 4.5.3 Percentage IPX users that are responsible for how much of the spending Chart 39 shows how many percent of the IPX premium users that needs to encounter for the spending. The Chart starts with the IPX users that spend the most. Table 3 shows the same number, that 1,3% of all IPX users or 3% of the IPX premium users are responsible for 25% of the spending, and also that 28% of the all IPX users or 62% of the IPX premium users are responsible for 90% of the spending. Percentage spending 100% 10% 1% 7,4% 7,6% 8,0% 8,2% 8,6% 9,0% 0,10% 0,11% 10% 10% 0,12% 0,12% 0,13% 0,14% 0,15% 0,16% 0,17% 0,18% 0,19% 0,21% 0,23% 0,25% 0,28% 11% 12% Percentage IPX users 15% 13% 14% 90% 75% 55% 49% 41% 34% 29% 21% 24% 17% 19% 0,31% 0,36% 0,41% 0,46% 0,54% 0,67% 0,79% 0,98% 1,29% 1,62% 2,36% 3,80% 5,98% 13,55% Chart 39: Percentage pending versus percentage IPX premium users. 36,57% 100,00% 33

Percent out of total spending Percent of all IPX users 25% 1,3% 3,0% 50% 6,1% 14% 75% 16% 37% 90% 28% 62% Percent of premium IPX Users Table 3: Percentage spending versus percentage IPX users and versus percentage IPX premium users. 4.5.4 Trends The techniques, categories, and operators with IPX users with high spending levels most likely use more expensive services, and not necessarily more services per IPX user. The different spending levels of the operators are independent of the categories. 55 % of the IPX users use only non-premium services. The user that uses one or just a few services uses more expensive services than 0-10 SEK. 28% of the all IPX users or 18% of the IPX premium users are responsible for 90% of the IPX spending. 4.6 How the price influences how the IPX user buy services 4.6.1 The distribution of transactions among the tariffs Chart 40 shows the distribution between number transactions made of the tariffs. Statistically, it can be shown that 6 SEK have more transactions than all other tariffs, next comes 10 SEK that also are significant different than all the other tariffs. Also, if the tariffs are split into two groups and it is possible to show that the groups are statistically different. The tariffs 2, 3, 4, 7, 12, 20, 25, 40, 49, and 50 SEK form one group that has less transactions made than the group of the tariffs 5, 15, and 30 SEK. Both these groups have statistically fewer transactions made than 10 SEK, and 6 SEK. Within the groups it is not possible to statistically show any difference in number of transactions made. 34

8% 7,34% 7% 6,21% 6% Percentage transactions 5% 4% 3% 3,38% 2,25% 2,67% 2% 1% 0% 0,51% 0,73% 0,64% 0,73% 0,36% 0,02% 0,01% 0,00% 0,13% 0,08% 0,00% 0,69 2 3 4 5 6 7 10 12 15 20 25 30 40 49 50 Tariffs (SEK) Chart 40: Distribution of the transactions over the tariffs. Chart 41 shows that the distribution of tariffs among the operators is varying, and the main reason is that some operators do not have services with certain tariffs. Therefore, the difference between the use of tariffs among the operators does not say anything about the IPX user and are excluded here. Different techniques use different tariffs, but that does not say anything about the IPX user either and therefore also excluded here. 60,00% 50,00% Percentage transactions 40,00% 30,00% 20,00% Operator 1 Operator 2 Operator 3 Operator 4 Operator 5 10,00% 0,00% 2 3 5 6 7 10 12 15 20 25 30 40 50 Tariffs (SEK) Chart 41: Distribution of the transactions over the tariffs for different operators. 4.6.2 The days of the month low and high tariffs are used If dividing the tariffs into a cheap group with the tariffs 0-14 SEK and expensive tariffs 15-50 SEK, Chart 42 shows the distribution over the month, based on the transactions made April through September 2005. It is not possible to statistically show any difference over the month. 35

4,5% 4,0% 3,5% Percentage transactions 3,0% 2,5% 2,0% 1,5% cheap expensive 1,0% 0,5% 0,0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Day of the month Chart 42: Percentage transactions made during the days of the month for low and high tariffs. 4.6.3 The days of the week low and high tariffs are used Chart 43 show that there is a difference between the two groups established in section 4.6.2 over the days of the week, based on the transactions made April through September 2005. It is possible to statistically establish that the expensive services are more likely to be used on the weekends, while the cheap services are more likely to be used on the weekdays. 15,50% 15,00% 15,07% 14,56% 14,72% 15,08% 15,19% 14,98% Percentage transactions 14,50% 14,00% 13,50% 13,00% 14,07% 13,54% 13,97% 14,20% 14,03% 14,23% 13,19% 13,18% Cheap Expensive 12,50% 12,00% Moday Tuesday Wednesday Thursday Friday Saturday Sunday Day of the week Chart 43: Percentage transactions made during the days of the week for low and high tariffs. 4.6.4 The hours of the day low and high tariffs are used Chart 44 shows the distribution during the hours of day of cheap and expensive services, based on the transactions made April through September 2005. It is possible in the chart to detect the difference that expensive services are used later than the cheaper services, but this difference is not possible to show statistically. 36

8% 7% 6% Percentage transactions 5% 4% 3% 2% cheap expensive 1% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 44: Percentage transactions made during the hours of the day for low and high tariffs. 4.6.5 Trends There are large variations in how much the tariffs are used. If the tariffs are dived into a cheap and an expensive group it is possible to show that more expensive services are used on weekends than weekdays. No other variation of the tariffs over time was possible to establish. 4.7 Clustering The SOM clustering is made to find clusters of content providers that have similar behavior to content providers within the cluster but unlike content providers in other clusters. The U-matrix will be used to visualize the cluster structure of the SOM map. The U-matrix that will be used here has a 4-by-4 hexagonal grid so that each centroid has six immediate neighbors. The label of each grid cell (cluster) is the same as for the majority of the data points associated with it. The location of a map unit is the same for all visualizations, so the location where a content provider s identification number is on one map, is where it can be found on all maps. The position on the map gives information how the clusters are related to each other, the closer the more similar properties. The vertical bar next to the U-matrix is the key to what the distance is in the U-matrix. High values indicate larger distance, which are cluster borders. Also, the U-matrix of the variables offers some additional information. For example, it is possible to find which cluster that is the most separate from the other clusters by identifying a cluster that has the most separate behavior in the variables. The K-means clustering will help us to identify the clusters in the U-matrix, where K is based on Davies-Bouldin index. 37

4.7.1 Transactions over the hours of the day per content providers To find clusters of content providers where the clusters have different patterns during the hours of the day, the U-matrix is examined. In the U-matrix, see Figure 3, there is no apparent border for clusters. In Figure 4 show that the distribution within all the variables are about the same. The first variable is between 00 and 01 and the next variable between 01 and 02, and so on. Even though the U-matrix does not reveal any apparent cluster, the Davies- Bouldin index suggests four clusters. The four k-mean clusters looks like in Figure 5. That is, most content providers use the services in the same way during the hours of the day, Cluster 1. The differences of the clusters are shown in Chart 45. The patterns for the clusters are very different during the hours of the day. Cluster 2 has the sharpest peak, and the earliest peak at 11. Cluster 3 also has an early peak, but not as sharp. Cluster 1 and Cluster 4 are similar except that cluster 1 has two dips, which is at 12 and 18. Figure 3: U-matrix for the number of transactions made during the hours of the day per content provider. Figure 4: Variables for the number of transactions made during the hours of the day per content provider. 38

Content provider: Cluster 1 The majority of the content providers Cluster 2 Content provider 33,45,54,225, and 10065 Cluster 3 Content provider 10062 Cluster 4 Content provider 38 and 10063 Figure 5: Clusters of content providers based on the number of transactions made during the hours of the day. 10,00% 9,00% 8,00% Percentage transactions 7,00% 6,00% 5,00% 4,00% 3,00% Cluster 1 Cluster 2 Cluster 3 Cluster 4 2,00% 1,00% 0,00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 45: Percentage transactions made of the clusters during the hours of the day. 4.7.2 Spending over the hours of the day per content providers To find clusters of content providers where the clusters have different patterns of spending over the hours of the day, the U-matrix is examined. In the U-matrix, in Figure 6, there is no apparent border for clusters. Figure 7 shows that the distribution within all the variables is about the same. The Davies-Bouldin index suggests four clusters. The four k-mean clusters looks like in Figure 8. The clusters suggest that, as for transactions, most content provider spend money on services in the same way, cluster 1. The differences of the clusters are shown in Chart 46. Cluster 1 has an extraordinary pattern compared to the other clusters. The other clusters have similar behavior where Cluster 2, and Cluster 4 reach their peak later in the day than Cluster 3. 39

Figure 6: U-matrix for the spending during the hours of the day per content provider. Figure 7: Variables for the spending during the hours of the day per content provider. 40

Content provider: Cluster 1 The majority of content providers Cluster 2 Content provider 8, 101, 158, and 173 Cluster 3 Content provider 33 and 54 Cluster 4 Content provider 45 and 10063 Figure 8: Clusters of content providers based on the spending during the hours of the day. 8,00% 7,00% 6,00% Percentage spending 5,00% 4,00% 3,00% Cluster 1 Cluster 2 Cluster 3 Cluster 4 2,00% 1,00% 0,00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 46: The spending of the clusters during the hours of the day. 41

4.7.3 WAP transactions over the hours of the day per content providers In section 4.2.3, for the transaction over the hours of the day WAP had a peculiar peak around 21-23. If this is due to one service or one content provider is here explored by clustering the content providers. In the U-matrix, in Figure 9, there is no apparent border for clusters, but a tendency of at least 3 clusters. Figure 10 shows that the distribution within all the variables is about the same. The Davies- Bouldin index suggests six clusters, two more than it was possible to detect in the U-matrix. The clusters are shown in Figure 11, where Cluster 1 includes most of the content providers. The differences of the clusters are shown in Chart 47, where the sixth cluster is excluded because of its irrelevance. The patterns for all the clusters are similar, and they all reach their peak as late in the day. Therefore, it is not possible to distinguish any content providers that are responsible for the peak late at night for the WAP transactions that was seen in Chart 8 in section 4.2.3. Figure 9: U-matrix for the number of transactions made during the hours of the day per content provider. Figure 10: Variables for number of transactions made during the hours of the day per content provider. 42

Figure 11: Clusters of content providers based the number of WAP transactions made during the hours of the day. Content provider: Cluster 1 The majority of content providers Cluster 2 Content provider 15, 33, 90, and 131 Cluster 3 Content provider 143 and 10139 Cluster 4 Content provider 30,31,39,40, and 152 Cluster 5 Content provider 18,45, and 139 9,00% 8,00% 7,00% Percentage transactions 6,00% 5,00% 4,00% 3,00% Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 2,00% 1,00% 0,00% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Hour of the day Chart 47: Percentage WAP transactions made of the clusters during the hours of the day. 43

4.7.4 Transactions and spending per IPX user per content providers To find clusters of content providers where the clusters have both high spending as well as many transactions made per IPX user, the U- matrix is examined for transactions and spending per IPX premium user. The U-matrix, in Figure 12, does not show any apparent border for clusters. Figure 13 shows that the distribution within all the variables is about the same. The Davies-Bouldin index suggests five clusters; these k- means clusters are shown in Figure 14. The difference in spending per IPX user for the clusters are shown Chart 48, and the difference in number of transaction made per user in Chart 49. Cluster 1 has the IPX in most transactions made as well as the highest spending per IPX user. Figure 12: U-matrix for the transactions and spending per IPX premium user. Figure 13: Variables for the transactions and spending per IPX user per content provider. 44

Figure 14: Clusters of content providers based on number transactions and spending per IPX user. Content provider: Cluster 1 Content provider 21, 45, 153, 10058, 10066, 10077, 10091 Cluster 2 Content provider 8, 33, 49, 60, 80, 90, 94, 114, 152, 10086, 10274 Cluster 3 Content provider 16, 17, 20, 22, 34, 39, 51, 53, 95, 101, 131, 139, 141, 173, 203, 10078, 10085 Cluster 4 Content provider 1, 15, 24, 25, 27, 28, 29, 31, 32, 35, 50, 54, 102, 103, 130, 147, 225, 10043, 10156, 10229, 10232, 10287 Cluster 5 The remaining content providers Table 4: Clusters of content providers based on number transactions and spending per IPX user. 45