CASE STUDY: WEB-DOMAIN PRICE PREDICTION ON THE SECONDARY MARKET (4-LETTER CASE)

Similar documents
Jialu Yan, Tingting Gao, Yilin Wei Advised by Dr. German Creamer, PhD, CFA Dec. 11th, Forecasting Rossmann Store Sales Prediction

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Big Data. Methodological issues in using Big Data for Official Statistics

IBM SPSS Decision Trees

E-Commerce Sales Prediction Using Listing Keywords

PRODUCT DESCRIPTIONS AND METRICS

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

CSC-272 Exam #1 February 13, 2015

Who Are My Best Customers?

Beating the Competition with Cognitive Commerce

Predictive analytics [Page 105]

DATA ANALYTICS WITH R, EXCEL & TABLEAU

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge

GIVING ANALYTICS MEANING AGAIN

3 Ways to Improve Your Targeted Marketing with Analytics

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

SPM 8.2. Salford Predictive Modeler

Predicting Yelp Restaurant Reviews

Global Ceramic Machinery Market: Size, Trends & Forecasts ( ) May 2017

Data Mining in CRM THE CRM STRATEGY

Marketing & Big Data

Application of Machine Learning to Financial Trading

Credit Card Marketing Classification Trees

Stay ahead of the game with Adalyser

Tree Depth in a Forest

Predicting user rating on Amazon Video Game Dataset

Computational Gambling

ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION

Australian Online Search and Directories Advertising Market

Insights from the Wikipedia Contest

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

INTRODUCTION TO THE BPA WORLDWIDE B2B MEDIA EXCHANGE

Achieve Better Insight and Prediction with Data Mining

ET MedialabsPvt. Ltd. Opp. WHY Select GO City ONLINE Walk?- Mall, New Delhi ; Contact :

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Predictive Planning for Supply Chain Management

IBM s Analytics Transformation

Olin Business School Master of Science in Customer Analytics (MSCA) Curriculum Academic Year. List of Courses by Semester

Predicting Customer Behavior Using Data Churn Analytics in Telecom

IBM SPSS Modeler Personal

SAP Predictive Analytics Suite

Random Forests. Parametrization and Dynamic Induction

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

ECONOMIC MODELLING & MACHINE LEARNING

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Software Quality Metrics. Analyzing & Measuring Customer Satisfaction (Chapter 14)

e7 Capacity Expansion Long-term resource planning for resource planners and portfolio managers

Welcome your.. virtual colleagues!

Forecasting diffusion with prelaunch online search traffic data

Chapter 8 Analytical Procedures

Experiences in the Use of Big Data for Official Statistics

Predicting Corporate Influence Cascades In Health Care Communities

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Data mining and Renewable energy. Cindi Thompson

ENGG1811: Data Analysis using Excel 1

A Personalized Company Recommender System for Job Seekers Yixin Cai, Ruixi Lin, Yue Kang

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31

Smart BW Bank. Gerrit Bungeroth, BW-Bank Stefan Weingärtner, AdvancedAnalytics.Academy

Introduction AdWords Guide

Data Mining Applications with R

Using Decision Tree to predict repeat customers

THE CONVERSION CYCLE

Retail Sales Benchmarks, KPI Definitions & Measurement Details

A better marketplace for almonds

science and applications

DON T FORGET ABOUT MEASUREMENT. Written by: Miko Kershberg, WSI Digital Marketing Expert

IBM Digital Recommendations

1.0 Chapter Introduction

FORTUNE FAVORS THE BRAVE EMPOWERING THE BACK OFFICE INSIGHT REPORT

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS

Automated Embedded AI Asset Intelligence. Jean-Michel Cambot Founder & Chief Evangelist

EST Accuracy of FEL 2 Estimates in Process Plants

BIA/Kelsey Local Commerce Monitor: SMB Adoption of Mobile, Social, E-Commerce, Loyalty Programs and Promotions

OPTIMIZING GOOGLE SHOPPING: STRUCTURE. Taking a closer look at optimizing Google Shopping and how it is structured

NIELSEN P$YCLE METHODOLOGY

Global Gas and Steam Turbine Markets Conventional Thermal Power Expansion Driven by Emerging Markets and Rising Natural Gas Availability

Intelligent continuous improvement, when BPM meets AI. Miguel Valdés Faura CEO and co-founder

Let the data speak: Machine learning methods for data editing and imputation

Leveraging Smart Meter Data & Expanding Services BY ELLEN FRANCONI, PH.D., BEMP, MEMBER ASHRAE; DAVID JUMP, PH.D., P.E.

ChannelAdvisor 2017 Analyst Meeting. March 8, 2017

Bivariate Data Notes

Understanding Ad Exchanges

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

Evalueserve IP and R&D Solutions

New Technologies in Banking

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

UltiPro Perception Collect and understand employee feedback with surveys and sentiment analysis

Make the Jump from Business User to Data Analyst in SAS Visual Analytics

Hotel Industry Demand Curves

To 3PL or Not to 3PL:

Building the In-Demand Skills for Analytics and Data Science Course Outline

Universal Office Copiers & Printers: Worldwide Market Opportunities and Product Requirements

The Analytical Revolution

Financial Management: Sales and Marketing

Bot Insight is here. Improve your company s top-and-bottom-line with powerful, real-time RPA Analytics Go be great.

DIGITAL MARKETING DATA SHEET CHANNELADVISOR DIGITAL MARKETING ENABLES RETAILERS TO: And many more

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

New restaurants fail at a surprisingly

Lumière. A Smart Review Analysis Engine. Ruchi Asthana Nathaniel Brennan Zhe Wang

Transcription:

CASE STUDY: WEB-DOMAIN PRICE PREDICTION ON THE SECONDARY MARKET (4-LETTER CASE) MAY 2016 MICHAEL.DOPIRA@ DATA-TRACER.COM

TABLE OF CONTENT SECTION 1 Research background Page 3 SECTION 2 Study design Page 8 SECTION 3 Results Page 11 APPENDIX 1 Benchmark models Page 18 APPENDIX 2 Random forest Page 20 2

SECTION 1 Research background 3

US DOMAIN INDUSTRY ALONE IS ESTIMATED AT $2B US Domain Industry, 2015 Annual Premium Web Domain Sales, USD M $2B revenue Market is dynamically growing in line with world growing e commerce industry +17% 8250 employees 4539 businesses 2005 2007 2009 2011 2013 2015 Domain industry is growing. In 2005-2015 CAGR constituted17%. Moreover, it is expected to grow even further due to overall growth of web-based businesses. Explosive development of Chinese e commerce is the latest trend fueling the growth of web-domain secondary market. It is already industry of considerable size since its market value (just in US) achieved 2 billion of US dollars. 4 Sources: Domain Name Prices (Dnpric.es), IBISWorld, Quartz.

AVERAGE DOMAIN PRICE ON THE SECONDARY MARKET IS STEADILY GROWING Average price of sold domain 2013 2015 (index, base year-2006) Top 5 most expensive deals, USD M (excluding web-sites for adults) 156 163 +8% 181 201 212 Fund.com We.com $8.0 $10.0 Diamond.com $7.5 Z.com $6.8 2013/01 2013/06 2014/01 2014/06 2015/01 Slots.com $5.5 Average price for domain on the secondary market has been growing steadily since 2013. Number of free domains (especially short and attractive ones) is constantly declining, causing growth of the secondary market. Most demanded domains achieved seven-digit price tags. 5 Sources: DNJournal, Sedo.com; 1- National Association of Securities Dealers Automated Quotations; 2- The Domain Name Price Index

MACHINE LEARNING IS NECESSARY TO PREDICT DOMAIN PRICES Share of domain sales quantities in different price segments 62% up to $100 26% 11% $100-$1000 $1000-$10000 Domain Price 1% $10000+ Examples of web-sites with prices of less than $100 Domain Price rzwv.com $1 ulpq.com $3 xcoi.com $5 kxoy.com $10 pjov.com $20 vugz.com $40 mosf.com $80 ogev.com $100 Majority of domains have price below $100. However, it is extremely difficult to guess the price without application of machine learning technics. The problem is that lions share of domains priced less than $100 do not contain real words. 6

PROJECT FEATURES: Project objective: predict price for an arbitrary 4-letter domain offered on the secondary market Data used: over 120,000 domain sales since 2000 Predictors: 200+ features reflecting linguistic, topic interest and market place information Methods employed: non-parametric regression (Random Forrest) Results: predictive accuracy on the test dataset is 82.9% (measured by goodness of fit R 2 ) Possible next steps: development of general predictive model (to all types of domains) Out-of-the-box-solutions: inclusion of Google search data as well as letter combination popularity of Peter Norvig 7

SECTION 2 Study design 8

THE GOAL OF THE STUDY IS CREATION OF WEB- DOMAIN PRICE PREDICTING MODEL Linguistic characteristics Market place info Topic interest Advanced data mining tool Random Forest $$$ Web-domain price prediction 9

THREE TYPES OF INPUT FEATURES ARE USED Linguistic Market place Topic interest Consonant-vowel pattern Letter repetition pattern Letter place pattern Frequency of letter combination usage Undesirable letter availability Whether real word is contained Seller Date of the deal Price of the previous deal of the same domain Number of Google Searches (bid & competition) of the word contained in the domain Domain extension (.com,.org,.tele, etc.) Total number of variables in the dataset - 238 10

SECTION 3 Results 11

THE MODEL OF RANDOM FOREST HAS SUBSTANTIAL PREDICTIVE POWER Price Predicted Price Random forest performed well in domain price forecasting. The goodness of fit is 82.9%, which means that model explains 82.9% of variation in domain prices. Random forest s results were compared to linear regression and decision tree models as benchmarks, and its predictions appeared statistically more powerful (details can be found in the appendix). 12 Note: Scatter plot reflects feet for randomly selected sample of 100 observations for logarithmic prices

VARIABLES WITH THE HIGHEST PREDICTIVE POWER Partner (Seller) indicator Previous price Date indicator Consonant-vowel pattern Frequency of 2-letter combinations Google Searches of containing word Domain extensions Indicator of company, which has sold the domain name Price of the domain at the moment of last sale Year and month indicator Pattern describing place of consonant and vowel letters in the word Number of times two-letter combination appeared in the set of texts analyzed by Peter Norvig In case domain contain real word, current indicator reflects number of Google Searches for this word Indicator of domain extension 13

EXAMPLE OF PRICE PREDICTION ALGORITHM Thai.co Mams.com Yftm.com Is real word contained? yes yes no What is year of deal? 2014 2016 2011 What is the seller? Afternic Sedo GoDaddy Predicted Price, USD True Price, USD? As the final output the client would be given model, which returns predicted prices for domain once its characteristics are entered 14

EXAMPLE OF PRICE PREDICTION ALGORITHM Thai.co Mams.com Yftm.com Is real word contained? yes yes no What is year of deal? 2014 2016 2011 What is the seller? Afternic Sedo GoDaddy Predicted Price, USD 2239 3935 30.14 True Price, USD? As the final output the client would be given model, which returns predicted prices for domain once its characteristics are entered 15

EXAMPLE OF PRICE PREDICTION ALGORITHM Thai.co Mams.com Yftm.com Is real word contained? yes yes no What is year of deal? 2014 2016 2011 What is the seller? Afternic Sedo GoDaddy Predicted Price, USD 2239 3935 30.14 True Price, USD 2200 3850 30? As the final output the client would be given model, which returns predicted prices for domain once its characteristics are entered 16

PROJECT SUMMARY Market set-up which explains domain prices is pretty complex and depends on many factors. These factors cannot be easily observed and their effects on prices are not obvious. Low and medium price deals constitute lion s share of the market. However, accurate prediction of the price in this segment is rather challenging but lucrative. In order to take in account numerous factors simultaneously we used advanced machine learning technique Random Forest, which is robust to overfitting. Developed statistical model is flexible and, therefore, can be applied to other similar problems (e.g. prediction of price for domains of any length). The research is based on open source data The introduced analytical model shows good forecasting power (R 2 is 82.9%). 17

APPENDIX 1 Benchmark models 18

RANDOM FOREST IS BETTER THAN BENCHMARKS Goodness of Fit Cross-Validation* 87.3% 87.0% 82.9% 80.6% 77.3% 74.5% We may underline that decision tree performs almost as well as Random Forest for total sample prediction; But due to higher resistance to overfitting Random Forest produces more accurate estimates on the test dataset. Random Forest Decision Tree Linear regression model 19 Note: Cross-Validation means that goodness of fit is measured on the bases of test dataset (which was not used for model fittin g).

APPENDIX 2 Random forest 20

DECISION TREE IS BASIC ELEMENT OF THE RANDOM FOREST Illustrative example of the Decision Tree segment built on the training data GENERAL IDEA: Decision tree classifies cases into groups or predicts values of a dependent (target) variable based on values of independent (predictor) variables. Independent variables are chosen in the way that groups are separated the best. EXAMPLE EXPLANATION The model determines how combination of various factors affects price of the domain. In the example only one branch of the tree is displayed fully, and it reflects how average price of domain sold on SEDO platform changes with domain extension, price of previous sale and consonant vowel pattern. Extension: com Partner Sedo (Yes/No) Previous Price <$80 Previous Price >=$80 Extension: org Order of variables and size of the tree is determined statistically Extension: net The tree grows from every node on every level (only some branches are displayed here) Extension: other Pattern: cvcv* Pattern: vcvc Pattern: vccv Pattern: cvvc Pattern: ccvv average price: $252 average price: $212 average price: $150 average price: $140 average price: $90 21 *Note: c stands for consonant, v stands for vowel

THE DECISION TREE CAN BECOME QUITE LARGE AND COMPLICATED Illustrative example of the section of full Decision Tree built on the training data set When all predictors are used in the analysis the tree becomes very large. However, the single tree is not sufficiently robust method and Random Forest is preferred. 22

RANDOM FOREST IS AGGREGATION OF DECISION TREES Random data subset Random variable subset Random data subset Random variable subset Random data subset Random variable subset Decision tree 1 Decision tree 2 Decision tree N* Results of the individual decision trees (typically 200-1000 trees) are aggregated and average prices are computed. Importance of each variable is calculated. 23 Note:(*) Optimal number of trees is determined during analysis - usually about 500 trees are built

RANDOM FOREST IS SUITABLE TOOL FOR DOMAIN PRICE PREDICTION The model does not require data to have specific distribution Both categorical and scale variables can be used Weak predictors are effectively incorporated in the model The model is not prone to overfitting, the model is robust Predictive power of the model does not deteriorate when large number of predictors is used. Final output of the model is price, which can be used as predictor of future sale of the domain 24

If you have any questions, please contact us: Skype: michael.dopira Email: michael.dopira@data-tracer.com 25