An Empirical Analysis of Search Engine Advertising:Sponsored Search in Electronic Markets 1

Size: px
Start display at page:

Download "An Empirical Analysis of Search Engine Advertising:Sponsored Search in Electronic Markets 1"

Transcription

1 An Emprcal Analyss of Search Engne Advertsng:Sponsored Search n Electronc Markets Anndya Ghose Stern School of Busness New York Unversty aghose@stern.nyu.edu Sha Yang Stern School of Busness New York Unversty shayang@stern.nyu.edu Abstract The phenomenon of sponsored search advertsng where advertsers pay a fee to Internet search engnes to be dsplayed alongsde organc (non-sponsored web search results s ganng ground as the largest source of revenues for search engnes. Usng a unque 6 month panel dataset of several hundred keywords collected from a large natonwde retaler that advertses on Google, we emprcally model the relatonshp between dfferent metrcs such as clck-through rates, converson rates, cost-per-clck, and ranks of these advertsements. Our paper proposes a novel framework and data to better understand what drves these dfferences. We use a Herarchcal Bayesan modelng framework and estmate the model usng Markov Chan Monte Carlo (MCMC methods. Usng a smultaneous equatons model, we quantfy the mpact of keyword type and length, poston of the advertsement and the landng page qualty on consumer search and purchase behavor as well as on advertser s cost per clck and the search engne s rankng decson for dfferent ads. We dscuss how our emprcal estmates shed lght on a few key assumptons made by exstng theoretcal models n sponsored search advertsng. Specfcally, we fnd that ( the monetary value of a clck s not unform across all slots because converson rates vary wth the poston of the advertsement on the search engne results page whch results n varablty n profts wth rank, ( clck-through rates decrease wth ad poston and the rate of change as one goes down the search engne screen s non-lnear, ( search engnes account for both current bd prce and pror clck through rates n allocatng the rank to a keyword n the aucton process and (v landng page qualty score affects converson rates and bd prce. Our analyses also lend quanttatve nsghts nto the relatve economc mpact of dfferent knds of keyword advertsements such as retaler-specfc ads, brand specfc ads or generc keywords as well as shorter or longer keywords. Our results provde descrptve nsghts to advertsers about what attrbutes of sponsored keyword advertsements contrbute to varaton n advertser value. Keywords: Onlne advertsng, Search engnes, Herarchcal Bayesan modelng, Pad search, Clck-through rates, Converson rates, Keyword rankng, Bd prce, Electronc commerce. Anndya Ghose s an Assstant Professor of Informaton Systems, and Sha Yang s an Assocate Professor of Marketng, both at Stern School of Busness, New York Unversty, West th Street, New York, NY. The authors would lke to thank the anonymous company that provded data for ths study. The authors are lsted n alphabetcal order and contrbuted equally. We are grateful to the Edtor, Assocate Edtor and three anonymous referees for extremely helpful comments. We also thank Susan Athey, Mchael Baye, Av Goldfarb, Greg Lews, Bll Greene, and partcpants at Unversty of Calgary, Unversty of Connectcut, UC Irvne, Purdue Unversty, Unversty of Washngton, McGll Unversty, New York Unversty, UT Dallas Marketng Conference, the 8 IIO Conference, the 8 NET Insttute conference, the 8 Marketng Scence Insttute conference, the WSDM 8 conference, WISE 7 conference, and INFORMS-CIST 7 for helpful comments. Anndya Ghose acknowledges the generous fnancal support from NSF CAREER award (IIS Fundng for ths project was also provded by the NET Insttute, the Marketng Scence Insttute and the NYU-Poly Seed grant. The usual dsclamer apples.

2 . Introducton The Internet has brought about a fundamental change n the way users generate and obtan nformaton, thereby facltatng a paradgm shft n consumer search and purchase patterns. In ths regard, search engnes are able to leverage ther value as nformaton locaton tools by sellng advertsng lnked to user generated queres and referrng them to the advertsers. Indeed, the phenomenon of sponsored search advertsng where advertsers pay a fee to Internet search engnes to be dsplayed alongsde organc (non-sponsored web search results s ganng ground as the largest source of revenues for search engnes. The global pad search advertsng market s predcted to have a 37% compound annual growth rate, to more than $33 bllon n and has become a crtcal component of frm s marketng campagns. Search engnes lke Google, Yahoo and MSN have dscovered that as ntermedares between users and frms, they are n a unque poston to try new forms of advertsements wthout annoyng consumers. In ths regard, sponsored search advertsng has gradually evolved to satsfy consumers penchant for relevant search results and advertsers' desre for nvtng hgh qualty traffc to ther webstes. These advertsements are based on customers own queres and are hence consdered far less ntrusve than onlne banner ads or pop-up ads. The specfc keywords n response to whch the ads are dsplayed are often chosen by frms based on user-generated content n onlne product revews, socal networks and blogs where users have posted ther opnons about frms products, often hghlghtng the specfc product features they value the most (Dhar and Ghose 9. In many ways, the ncreased ablty of users to nteract wth frms n the onlne world has enabled a shft from mass advertsng to more targeted advertsng. How does ths mechansm work? In sponsored search, frms who wsh to advertse ther products or servces on the Internet submt ther product nformaton n the form of specfc keyword lstngs to search engnes. Bd values are assgned to each ndvdual ad to determne the poston of each competng lstng on the search engne results page when a user performs a search. When a consumer searches for a term on the search engne, the advertsers web page appears as a sponsored lnk next to the organc search results that would otherwse be returned usng the neutral crtera employed by the search engne. By allottng a specfc value to each keyword, advertsers only pay the assgned prce for the users who actually clck on ther lstng to vst ts webste n the most prevalent payment mechansm known as cost-per-clck (CPC. Because lstngs appear only when a user generates a

3 keyword query, an advertser can reach a more targeted audence on a relatvely lower budget through search engne advertsng. Despte the growth of search advertsng, we have lttle understandng of how consumers respond to contextual and sponsored search advertsng on the Internet. In ths paper, we focus on prevously unexplored ssues: How does sponsored search advertsng affect consumer search and purchasng behavor on the Internet? More specfcally, what knds of sponsored keyword advertsement most contrbute to varaton n advertser value n terms of consumer clck-through rates and conversons? What s the relatonshp between dfferent knds of keywords and the advertser s actual cost-per-clck, and the search engne s keyword rankng decson? Whle an emergng stream of theoretcal lterature n sponsored search has looked at ssues such as mechansm desgn n auctons, no pror work has emprcally analyzed these knds of questons. Gven the shft n advertsng from tradtonal banner advertsng to search engne advertsng, an understandng of the determnants of converson rates and clck-through rates n search advertsng s essental for both tradtonal and Internet retalers. Usng a unque panel dataset of several hundred keywords collected from a large natonwde retaler that advertses on Google, we study the effect of sponsored search advertsng on consumer and frm behavor. In partcular, we propose a Herarchcal Bayesan modelng framework n whch we buld a smultaneous model to jontly estmate the mpact of varous attrbutes of sponsored keyword advertsements on consumer clck-through and purchase propenstes, on the advertser s cost-perclck (CPC decson and on the search engne ad rankng decson. The presence of retaler-specfc nformaton n the keyword s assocated wth an ncrease n clck-through and converson rates by.7% and 5.6%, respectvely, the presence of brand-specfc nformaton n the keyword s assocated wth a decrease n clck-through and converson rates by 56.6% and.%, respectvely, whle the length of the keyword s assocated wth a decrease n clck-through rates by 3.9%. Keyword rank s negatvely assocated wth the clck-through rates and converson rates and ths relatonshp s ncreasng at a decreasng rate. An ncrease n the landng page qualty score of the advertser by unt leads to an ncrease n converson rates by as much as.5% and a decrease n the advertser s cost-perclck (CPC. Further, we show that the advertser s CPC s negatvely assocated wth the landng page qualty as well as by the presence of ts own nformaton but postvely assocated wth the presence of brand-specfc nformaton n the keyword. Fnally, our data suggests that profts are not necessarly monotonc wth rank such that keywords that have more promnent postons on the search engne

4 results page and thus experence hgher clck-through rates may not necessarly be the most proftable ones. In fact, profts are often hgher for keywords that are ranked n the mddle as opposed to those n the very top or n the bottom. Our paper ams to make three key contrbutons.. These are summarzed as follows. Frst, our paper s the frst emprcal study that smultaneously models and documents the mpact of search engne advertsng on all three enttes nvolved n the process consumers, advertsers and search engnes. The proposed smultaneous model provdes a natural way to account for endogenous relatonshps between decson varables, leadng to a robust dentfcaton strategy and precse estmates. The model can be appled to smlar data from other ndustres. Moreover, unlke prevous work, we jontly study consumer clck-through behavor and converson behavor condtonal on a clck-through n studyng consumer search behavor. Ignorng consumer clck-through behavor can lead to selectvty bas f the error terms n the clck-through probablty and n the condtonal converson probablty are correlated (Maddala 983, and ths s an addtonal contrbuton. The proposed Bayesan estmaton algorthm provdes a convenent way to estmate such model by usng data augmentaton. Second, our paper provdes nsghts nto assumptons made n the theoretcal modelng lterature on search engne advertsng. By showng a drect negatve relatonshp between converson rates and rank, we show that the value per clck to an advertser s not unform across slots. Ths fndng refutes a commonly held assumpton n pror work that the value of a clck from a sponsored search campagn s ndependent of the poston of the advertsement. Pror theoretcal work (for example, Aggarwal et al. 6, Edelman et al. 7, Varan 7 also make a common assumpton of unform value per clck across all ranks and show that under ths condton, sponsored search auctons maxmze socal welfare. Our fndng of non-unformty n value per clck paves the way for future theoretcal models n ths doman that could relax ths assumpton and desgn newer mechansms wth more robust equlbrum propertes. The recent work by Borgers et al. (7 and Xu et al. (9 that ncorporates non-unform values for clcks n ther theoretcal model s a step n ths drecton. In addton to ths, we demonstrate that ( search engnes are ndeed takng nto account both the current perod s bd prce as well as pror clck-through rates of the keywords before decdng the fnal rank of an advertsement n the current perod and ( landng page qualty nvestments are assocated wth changes n converson rates and changes n advertser s cost-per-clck.. Our fndngs thus emprcally valdate assumptons about search 3

5 engne decson-makng process made n the theoretcal work n search engne advertsng and corroborate clams about nsttutonal practce n ths ndustry. Thrd, our model provdes descrptve nsghts to advertsers about what knds of sponsored keyword advertsements contrbute to varaton n advertser value. In partcular, our study quantfes the relatonshp between branded/retaler/generc and shorter/longer keywords and demand sde varables lke clck-through rates and converson rates a queston of ncreasng nterest to many frms. Our study also provdes descrptve nsghts regardng how mprovements n landng page qualtes and ncreases n bd prces are assocated wth varous performance metrcs. The remander of ths paper s organzed as follows. Secton gves an overvew of the dfferent streams of lterature from marketng and computer scence related to our paper. Secton 3 descrbes the data and gves a bref background nto some dfferent aspects of sponsored search advertsng that could be useful before we proceed to the emprcal models and analyses. In Secton, we present a model to study the clck-through rate, converson rate and keyword rankng smultaneously, and dscuss our dentfcaton strategy. In Secton 5 we dscuss our emprcal fndngs. In Secton 6, we dscuss some mplcatons of our fndngs and then conclude the paper.. Lterature and Theoretcal Background Our paper s related to several streams of research. A number of approaches have been buld to modelng the effects of advertsng based on aggregate data (Tells. Much of the exstng academc (e.g., Gallagher et al., Dreze and Hussherr 3 on advertsng n onlne world has focused on measurng changes n brand awareness, brand atttudes, and purchase ntentons as a functon of exposure. Ths s usually done va feld surveys or laboratory experments usng ndvdual (or cooke level data. Sherman and Deghton ( and Ilfeld and Wner (, show usng aggregate data that ncreased onlne advertsng leads to more ste vsts. In contrast to other studes whch measure (ndvdual exposure to advertsng va aggregate advertsng dollars (e.g., Mela et al. 998, Ilfeld and Wner, we use data on ndvdual search keyword advertsng exposure. Manchanda et al. (6 look at onlne banner advertsng. Because banner ads have been perceved by many consumers as beng annoyng, tradtonally they have had a negatve connotaton assocated wth t. Moreover, t was argued that snce there s consderable evdence that only a small proporton of vsts translate nto fnal purchase (Sherman and Deghton, Moe and Fader 3, Chatterjee et al. 3, clck-through

6 rates may be too mprecse for measurng the effectveness of banners served to the mass market. Interestngly however, Manchanda et al. (6, found that banner advertsng actually ncreases purchasng behavor, n contrast to conventonal wsdom. These studes therefore hghlght the mportance of nvestgatng the mpact of other knds of onlne advertsng such as search keyword advertsng on actual purchase behavor, snce the success of keyword advertsng s also based on consumer clck-through rates. Our study s also related to other forms of pad placements avalable to retalers on the nternet avalable such as sponsored lstngs on shoppng bots (e.g., Baye and Morgan, Baye et al. 8 who have studed the role of shoppng bots as nformaton gate keepers and estmated the mpact of retalers rank durng placement on clck-through rates. There s also an emergng theoretcal stream of lterature exemplfed by Aggarwal et al. (6, Edelman et al. (7, Feng et al. (7, Varan (7, and Lu et al. (8 who analyze mechansm desgn and equlbra n search engne auctons. Chen and He (6, and Athey and Ellson (8 buld models that ntegrate consumer behavor wth advertser decsons, and the latter paper theoretcally analyzes several possble scenaros n the desgn of sponsored keyword auctons. Katona andsarvary (7 buld a model of competton n sponsored search and fnd that the nteracton between search lstngs and pad lnks determne equlbrum bddng behavor. Gerstmeer et al. (8 dscuss some nterestng bddng heurstcs and hghlght whch of these lead to hgher profts for the advertser. Despte the emergng theory work, very lttle emprcal work exsts n onlne search advertsng. Ths s prmarly because of dffculty for researchers to obtan such advertser-level data. Exstng work has so far focused on search engne performance (Telang et al., Bradlow and Schmttlen. Moreover, the handful of studes that exst n search engne marketng have typcally analyzed publcly avalable data from search engnes. Anmesh et al. (8 look at the presence of qualty uncertanty and adverse selecton n pad search advertsng on search engnes. Goldfarb and Tucker (7 examne the factors that drve varaton n prces for advertsng legal servces on Google. Agarwal et al. (8 provde quanttatve nsghts nto the proftablty of advertsements assocated wth dfferences n keyword poston and show that profts may not be monotonc wth rank. Ghose and Yang (8 buld a model to map consumers search-purchase relatonshp n sponsored search advertsng. They provde evdence of horzontal spllover effects from search advertsng resultng n purchases across other product categores. Rutz and Buckln (7b showed that there are spllovers between search 5

7 advertsng on branded and generc keywords, as some customers may start wth a generc search to gather nformaton, but later use a branded search to complete ther transacton. In an nterestng paper related to our work, Rutz and Buckln (7a studed hotel marketng keywords to analyze the proftablty of dfferent campagn management strateges. However, our paper dffers from thers and extends ther work n several mportant ways. Rutz and Buckln (7a only model the converson probablty condtonal on postve number of clck-throughs. However, our paper models clck-through and converson rates smultaneously n order to allevate potental selectvty bases. In addton, we also model the search engne s rankng decson and the advertser s decson on cost-per-clck (CPC, both of whch are absent n ther paper. Our analyss reveals that t s mportant to model the advertser and the search engne s decsons smultaneously wth clcks and converson snce both CPC and Rank have been found to be endogenous. These ssues are not addressed n ther paper. To summarze, our research s dstnct from extant onlne advertsng research as t has largely been lmted to the nfluence of banner advertsements on atttudes and behavor. We extend the lterature by emprcally comparng the mpact of dfferent keyword characterstcs on the performance of onlne search advertsng n pad search towards understandng the larger queston of analyzng how keyword characterstcs drve consumers search and purchase behavor, as well as frms optmal bd prces and rankng decsons. 3. Data We frst descrbe the data generaton process for pad keyword advertsement snce t dffers on many dmensons from tradtonal offlne advertsement. Advertsers bd on keywords durng the aucton process. A keyword may consst of one or more words. Once the advertser gets a rank allotted for ts keyword ad, these sponsored ads are dsplayed on the top left, and rght of the computer screen n response to a query that a consumer types on the search engne. The match between a user query and the advertsement could be based on ether a broad, exact or phrase match. The ad typcally conssts of headlne, a word or a lmted number of words descrbng the product or servce, and a hyperlnk that refers the consumer to the advertser s webste after a clck. The servng of the ad n response to a query for a certan keyword s denoted as an mpresson. If the consumer clcks on the ad, he s led to the landng page of the advertser s webste. Ths s recorded as a clck, and advertsers usually pay on a per clck bass. In the event that the consumer ends up purchasng a product from the advertser, ths s recorded as a converson. 6

8 Our data contans weekly nformaton on pad search advertsng from a large natonwde retal chan, whch advertses on Google. The data span all keyword advertsements by the company durng a perod of sx months n 7, specfcally for the calendar weeks from January to June. Each keyword n our data has a unque advertsement ID. The data s for a gven keyword for a gven week and s based on an exact match between the user query and sponsored ad. It conssts of the number of mpressons, number of clcks, the average cost per clck (CPC whch represents the bd prce, the rank of the keyword, the number of conversons, and the total revenues from a converson. Whle an mpresson often leads to a clck, t may not lead to an actual purchase (defned as a converson. Based on these data, we compute the Clck-through Rate (clcks/mpressons and Converson Rate (conversons/clcks varables. The product of CPC and number of clcks gves the total costs to the frm for sponsorng a partcular advertsement. Based on the contrbuton margn and the revenues from each converson through a pad search advertsement, we are able to compute the gross proft per keyword from a converson. The dfference between gross profts and keyword advertsng costs (the number of clcks tmes the cost-per-clck gves the net profts accrung to the retaler from a sponsored keyword converson. Ths s the Proft varable. Fnally, whle we have data on the URLs of the landng page correspondng to a gven keyword, we do not have data on landng page qualty scores or content, snce the exact algorthm used by Google to mpute the landng page qualty s not dsclosed to the publc. 3 Hence, we use a sem-automated approach wth content analyss to mpute the landng page qualty based on the three known metrcs used by Google. Google uses a weghted average of Relevancy, Transparency and Navgablty to mpute the landng page qualty of a gven weblnk. We hred two ndependent annotators to rate each landng page based on each of these metrcs and then computed the weghted average of the scores. The nter-rater relablty score was.73, ndcatng a very hgh level of relablty. The frm s a large Fortune-5 retal store chan wth several hundred retal stores n the US but due to the nature of the data sharng agreement between the frm and us, we are unable to reveal the name of the frm. 3 Google computes a qualty score for each landng page as a functon of the ste s navgablty as well as the relevance and transparency of nformaton on that page n order to provde hgher user experence after a clck-through to the ste. Besdes these relevancy factors, the qualty score s also based on clck-through rates. However, the exact algorthm for computng ths score s not publcly avalable. The qualty score s then used n determnng the mnmum bd prce, whch n turn affects the rank of the ad, gven the typcal advertser budget constrants. Further nformaton on these aspects s avalable at 7

9 Our fnal dataset ncludes 966 observatons from a total of 878 unque keywords. Note that our man nterest n ths emprcal nvestgaton s to examne varous factors that drve dfferences n clckthroughs and conversons. Hence, we analyze clck-through rates, converson rates, cost-per-clck, and rank by jontly modelng the consumers search and purchase behavor, the advertser s decson on cost per clck, and the search engne s keyword rank allocatng behavor. Table reports the summary statstcs of our dataset. As shown, the average weekly number of mpressons s for one keyword, among whch around 6 lead to a clck-through, and.85 lead to a purchase. Our data suggest the average cost per-clck for a gven keyword s about 5 cents, and the average rank (poston of these keywords s about 6.9. Fnally, we have nformaton on three mportant keyword characterstcs, whch we next brefly dscuss wth a focus on the ratonale of analyzng them. As Table shows, there s a substantal amount of varaton n clcks, converson, rank and CPC of each keyword over tme. We enhanced the dataset by ntroducng keyword-specfc characterstcs such as Brand, Retaler and Length. For each keyword, we constructed two dummy varables, based on whether they were ( branded keywords or not (for example, Sealy mattress, Nautca bedsheets, and ( retaler-specfc advertsements (for example, Wal-Mart, walmart.com or not. To be precse, for creatng the varable n ( we looked for the presence of a brand name (ether a product-specfc or a company specfc n the keyword, and labeled the dummy as or, wth ndcatng the presence of a brand name. For (, we looked for the presence of the specfc advertser s (retaler name n the keyword, and then labeled the dummy as or, wth ndcatng the presence of the retaler s name. = = Insert Table = =. A Smultaneous Model of Clck-through, Converson, CPC and Keyword Rank We cast our model n a herarchcal Bayesan framework and estmate t usng Markov Chan Monte Carlo methods (see Ross and Allenby 3 for a detaled revew of such models. We postulate that the decson of whether to clck and purchase n a gven week wll be affected by the probablty of advertsng exposure (for example, through the rank of the keyword and ndvdual keyword-level dfferences (both observed and unobserved. We smultaneously model consumers clck-through and converson behavor, the advertser s keyword prcng behavor, and the search engne s keyword rank allocatng behavor. 8

10 . Theoretcal setup Assume for search keyword at week j, there are n clck-throughs among N mpressons (the number of tmes an advertsement s dsplayed by the retaler, where n N and N >. Suppose that among the n clck-throughs, there are m clck-throughs that lead to purchases, where m n. Let us further assume that the probablty of havng a clck-through s p and the probablty of havng a purchase condtonal on a clck-through s q. In our model, a consumer faces decsons at two levels one, when she sees a keyword advertsement, she makes decson whether or not to clck t; two, f she clcks on the advertsement, she can take any one of the followng two actons make a purchase or not make a purchase. Thus, there are three types of observatons. Frst, a person clcked through and made a purchase. The probablty of such an event s p q. Second, a person clcked through but dd not make a purchase. The probablty of such an event s p (- q. Thrd, an mpresson dd not lead to a clck-through or purchase. The probablty of such an event s - p. Then, the probablty of observng (n, m s gven by: N! f ( n, m, p, q (. = m n m N n { pq} { p ( q } { p m!( n m!( N n }!. Modelng the Consumer s Decson: Clck-through Pror work (Broder, Jansen and Spnk 7 has analyzed the goals for users web searches and classfed user queres n search engnes nto three categores of searches: navgatonal (for example, a search query consstng of a specfc frm or retaler, transactonal (for example, a search query consstng of a specfc product or nformatonal (for example, a search query consstng of longer words. Beng cognzant of such user behavor, search engnes not only sell non-branded or generc keywords as advertsements, but also well-known product or manufacturer brand names as well as keywords ndcatng the specfc advertser n order for the frm to attract consumers to ts webste. Moreover, advertsers also have the opton of makng the keyword advertsement ether generc or specfc by alterng the number of words contaned n the keyword. Fnally, the length of the keyword s also an mportant determnant of search and purchase behavor but anecdotal evdence on ths vares across For example, a consumer seekng to purchase a dgtal camera s as lkely to search for a popular manufacturer brand name such as NIKON, CANON or KODAK on a search engne as searchng for the generc phrase dgtal camera. Smlarly, the same consumer may also search for a retaler such as BEST BUY n order to buy the dgtal camera drectly from the retaler 9

11 trade press reports. Some studes have shown that the percentage of searchers who use a combnaton of keywords s.6 tmes the percentage of those who use sngle-keyword queres (Klpatrck 3. In contrast, another study found that sngle-keywords have on average the hghest number of unque vstors (Oneupweb 5. In our data, the average length of a keyword s about.6. In sum, the number of advertsers placng a bd, whch can affect the number of clcks receved by a gven ad, wll vary based on the knd of keyword that s advertsed. Hence, we focus on the three mportant keywordspecfc characterstcs for the frm when t advertses on a search engne: Brand, Retaler and Length. The clck-through probablty s lkely to be nfluenced by the poston of the ad (Rank, how specfc or broad the keyword s (Length, and whether s contans any retaler-specfc (Retaler or brand-specfc nformaton (Brand. Hence, n equaton (., p the clck-through probablty s modeled as: p exp( β + βrank + αretaler + αbrand + α3length + αtme + ε = + exp( β + β Rank + α Retaler + α Brand + α Length + α Tme + ε 3 (. We capture the unobserved heterogenety wth a random coeffcent on the ntercept by allowng β to vary along ts populaton mean β as follows: β β β + ς = (.3 We also allow the Rank coeffcent of the th keyword to vary along the populaton mean β and the keywords characterstcs as follows: β β β + γ Re taler + γ Brand + γ 3Length + ς = (. β ς Σ ~ MVN, β ς Σ β β Σ Σ β β (.5.3 Modelng the Consumer s Decson: Converson Next we model the converson rates. Pror work (Brooks 5 has shown that there s an ntrnsc trust value assocated wth the rank of a frm s lstng on a search engne, whch could lead to the converson rate droppng sgnfcantly wth an ncrease n the rank (.e., wth a lower poston on the screen. Hence, we nclude rank as a covarate. 5 Another factor that can nfluence converson rates s 5 As a robustness test, we also run our emprcal analyses usng a quadratc term for Rank n addton to the lnear term n both the converson rate and clck-through rate equaton. Ths helps us examne the rate of change of ths varable. The qualtatve nature of all our results reman unchanged when we nclude a non-lnear term for Rank.

12 the qualty of the landng page of the advertser s webste. Anecdotal evdence suggests that f onlne consumers use a search engne to drect them to a product but don t see t addressed adequately on the landng page, they are lkely to abandon ther search and purchase process. Dfferent keywords from a gven advertser lead to dfferent knds of landng pages. Hence, t s mportant to ncorporate the landng page qualty as a covarate n the model. Furthermore, dfferent keywords are assocated wth dfferent products. It s possble that product-specfc characterstcs nfluence consumer converson rates, and thus, t s mportant to control for the unobserved product characterstcs that may nfluence converson rates once the consumer s on the webste of the advertser. Hence, we nclude the three keyword characterstcs to proxy for the unobserved keyword heterogenety stemmng from the dfferent products sold by the advertser. Thus, the converson probablty s lkely to be nfluenced by the poston of the ad on the screen, the three keyword specfc characterstcs, and the landng page qualty score. These factors lead us to model the converson probabltes as follows: q exp( θ + θrank + δretaler + δbrand + δ3length + δlandng Page Qualty + δ5tme + η = + exp( θ + θ Rank + δ Retaler + δ Brand + δ Length + δ Landng Page Qualty + δ Tme + η 3 5 (.6 As before, we capture the unobserved heterogenety wth a random coeffcent specfed on both the ntercept and the Rank coeffcent, as follows: = (.7 θ θ θ + ς θ = + Retaler + Brand + 3Length + Landng Page Qualty + (.8 θ θ κ κ κ κ ς θ ς Σ ~ MVN, θ ς Σ θ θ Σ Σ θ θ Thus, equatons (. - (.9 model the demand for a keyword,.e. consumer s decson. (.9. Modelng the Advertser s Decson Cost Per Clck Next, we model the advertser s (.e., the frm s strategc behavor. The advertser has to decde on how much to bd for each keyword n week j and thus the cost per clck (CPC that t s wllng to ncur. 6 The advertser decdes on t CPC by trackng the performance of a keyword over tme such that the 6 Snce we do not have data on actual bds, we use the actual cost-per-clck (CPC as a proxy for the bd prce. Accordng to the frm whose data we use, they are very strongly correlated, and hence t s a very reasonable proxy.

13 current CPC s dependent on past performance of that keyword. 7 Specfcally, the keyword s current CPC s a functon of the rank of the same keyword n the prevous perod. In keepng wth the nsttutonal practces of Google whch decdes the mnmum bd prce of any gven keyword ad as a functon of landng page qualty assocated wth that keyword, we control for the landng page qualty n the advertser s CPC decson. Dfferent keyword attrbutes determne the extent of compettveness n the bddng process for that keyword as can be seen n the number of advertsers who place a bd. For example, a retaler keyword s lkely to be far less compettve snce the specfc advertser s usually the only frm that wll bd on such a keyword. On the other hand, branded keywords are lkely to be much more compettve snce there are several advertsers (retalers who sell that brand who wll bd on that keyword. Smlarly, smaller keywords typcally tend to ndcate more generc ads and are lkely to be much more compettve whereas longer keywords typcally tend to ndcate more specfc ads, and are lkely to be less compettve. Hence, the advertser s CPC for a gven keyword also depends on the three keyword attrbutes. Thus, the CPC wll be nfluenced by the rank of the ad n the prevous tme perod, the three keyword specfc characterstcs, and the landng page qualty. Ths leads to the followng equaton: ln( CPC = + Rank + λ Retaler + λ Brand + λ Length +, j 3 λ Landng Page Qualty + λ Tme + µ 5 (. = (. + ς = + Retaler + Brand + 3Length + Landng Page Qualty + ρ ρ ρ ρ ς (. = + Retaler + Brand + 3Length + Landng Page Qualty + (.3 ρ ρ ρ ρ ς The error terms n equatons (. (.3 are dstrbuted as follows: ς Σ ς ~ MVN Σ, ς Σ 3 Σ Σ Σ 3 Σ Σ Σ (. 7 Ths nformaton about current bds beng based on past performance (lagged rank was gven to us by the advertser. The qualtatve nature of all our results are robust to the use of both one perod lagged rank and one perod lagged profts from a gven keyword ad whch s another heurstc used by other advertsers to decde on how much to bd n a gven perod.

14 .5 Modelng the Search Engne s Decson Keyword Rank Fnally, we model the search engne s decson on assgnng ranks for a sponsored keyword advertsement. Durng the aucton, search engnes lke Google, MSN and Yahoo decde on the keyword rank by takng nto account both the current CPC bd and a Qualty Score that s determned by the pror clck-through rate (CTR of that keyword (Varan 7, Athey and Ellson 8 amongst other factors. Snce more recent CTR s gven hgher weghtage by the search engne n computng ths score, we use the one perod lagged value of CTR. The three keyword attrbutes are used to control for unobserved characterstcs such as the extent of competton n the aucton bddng process as before n the CPC decson. Hence the rank s modeled as beng dependent on these three keyword attrbutes. Ths leads to the followng equaton for the Rank of a keyword n sponsored search: ln( Rank CPC CTR Retaler Brand Length Tme = φ + φ, j + φ, j + τ + τ + τ3 + τ + ν (.5 = (.6 φ φ φ + ς π = + Retaler + Brand + 3Length + (.7 φ φ π π π ς The error terms n equatons (.6 and (.7 are dstrbuted as follows: φ φ φ ς Σ Σ ~ MVN, φ φ (.8 φ ς Σ Σ Fnally, to model the unobserved co-varaton among clck-through, conversons, CPC bd and the keyword rankng, we let the four error terms to be correlated n the followng manner: ε Ω η Ω ~ MVN, µ Ω ν Ω 3 Ω Ω Ω Ω 3 Ω Ω Ω Ω Ω Ω Ω Ω 3 (.9 A couple of clarfcatons are useful to note here. Frst, the three characterstcs of a keyword (Retaler, Brand, Length are all mean centered. Ths means that β s the average effect of β n equaton (.. A smlar nterpretaton apples to the parameters θ,, and φ. Second, n equatons (., (.6, (. and (.5, we have controlled for the temporal effects by estmatng tme-perod effects that captures unobserved ndustry dynamcs. 3

15 .6 Identfcaton To ensure that the model s fully dentfed even wth sparse data (data n whch a large proporton of observatons are zero, we conduct the followng smulaton. We pcked a set of parameter values, and generated the number of clck-throughs, the number of purchases, CPC bd, and rankng for each keyword, whch mmcked ther actual observed values n the data accordng to the model and the actual ndependent varables observed n our data. We then estmated the proposed model wth the smulated dataset and found that we were able to recover the true parameter values. Ths releves a potental concern on emprcal dentfcaton of the model due to the sparseness of the data. In order to show any endogenety ssues and the dentfcaton of the proposed system of smultaneous equaton model, we provde a sketch of the model below. Note that our proposed model bols down to the followng smultaneous equatons: p = f Rank, X, (. ( ε q = f Rank, X, condtonal on the number of clck-throughs > (. ( ε CPC = f X, (. 3( 3 ε 3 Rank = f CPC, X, (.3 ( ε Here p s the clck-through probablty, q s the converson probablty condtonal on clck-through, CPC s cost-per-clck and Rank s the poston of a keyword n the lstng. X X are the exogenous covarates correspondng to the four equatons. ε ε are the error terms assocated wth the four equatons, respectvely. These error terms are manly capturng nformaton that s observed by the decson makers (consumer, advertser, and search engne but not observed by the researcher. Further, f ε or ε s correlated wth ε, Rank wll be endogenous. If ε 3 s correlated wth ε, CPC wll be endogenous. Our proposed smultaneous model closely resembles the trangular system n standard econometrc textbooks (Lahr and Schmdt 978, Greene 999. To see ths more clearly, CPC s modeled as exogenously determned (modeled as the advertser s decson and a functon of the advertser s past performance wth the same keyword and other keyword related characterstcs. CPC, n turn, affects the search engne s rankng decson, and fnally Rank affects both clck-through and the converson probabltes. As shown n Lahr and Schmdt (978 and dscussed n Greene (999, a trangular system of smultaneous equatons can be dentfed wthout any further dentfcaton constrant such as nonlnearty or correlaton restrcton. In partcular, the dentfcaton of such a trangular system comes

16 from the lkelhood functon. Ths s also noted by Hausman (975 who observes that n a trangular system, the Jacoban term n the lkelhood functon vanshes so that the lkelhood functon s the same as for the usual seemngly unrelated regressons problem (Hausman 975. Hence, a GLS (generalzed least squares or SURE (seemngly unrelated regresson based estmaton leads to unquely dentfed estmates n a trangular system wth a full covarance on error terms Lahr and Schmdt (978. We also provde the parameters produced by the estmaton of ths system under the assumpton of dagonalty (restrctng covarance elements to be zero n order to be able to compare them to the generalzed results. These are gven n the tables n Appendx B. These estmates show that t s mportant to control for endogenety snce the parameter estmates are attenuated when we restrct the covarance elements to be zero, and thus based. For example, n the case of estmatng CTR and converson rates, the parameter estmates on Rank are much closer to zero under the assumpton of dagonalty than otherwse. Smlarly, n the case of estmatng Rank, the parameter estmates on Lag_CTR and CPC are sgnfcantly closer to zero under the assumpton of dagonalty than otherwse. Note that the converson probablty q s only defned when the number of clck-throughs s greater than zero. In ths case, f ε and ε are correlated as n our data, then the condtonal mean of ε condtonal on a postve clck-through probablty s not gong to be zero. Then, a model n whch one only looks at the converson condtonal on postve number of clck-throughs (.e. does not model the clck-through behavor smultaneously s gong to suffer from the selecton bas. By jontly modelng clck-through and converson behavor, our proposed model accounts for such selectvty ssues. The proposed Bayesan estmaton approach also offers a computatonally convenent way to deal wth the selectvty problem by augmentng the unobserved clck-through ntenton when there are no clcks. 5. Emprcal Analyss Next, we dscuss our emprcal fndngs. We frst dscuss the effects of varous keyword characterstcs and keyword rankng on clck-through rates of the sponsored search advertsements. 5. Results The coeffcent of Retaler n Table a s postve and sgnfcant ndcatng that keyword advertsements that contan retaler-specfc nformaton lead to a sgnfcant ncrease n clck-through rates. Specfcally, 5

17 ths corresponds to a.7% ncrease n clck-through rates wth the presence of retaler nformaton. Further, the coeffcent of Brand n Table a s negatve and sgnfcant ndcatng that keyword advertsements that contan brand-specfc nformaton can lead to a 56.6 % decrease n clck-through rates. These results are useful for managers because they mply that keyword advertsements that explctly contan nformaton dentfyng the advertser lead to hgher clck-through rates whle those that explctly contan nformaton dentfyng the brand lead to lower clck-through rates than keywords whch lack such nformaton. On the other hand, the coeffcent of Length n Table a s negatve suggestng that longer keywords typcally tend to experence lower clck-through rates. Specfcally, we fnd that all else equal an ncrease n the length of the keyword by one word s assocated wth a decrease n the clck-through rates by 3.9%. = = Insert Tables a and b = = Intutvely, ths result has an nterestng mplcaton f one were to te ths result wth those n the lterature on consderaton sets n marketng. A longer keyword typcally tends to suggest a more drected or specfc search whereas a shorter keyword typcally suggests a more generc search. That s, the shorter the keyword s, the less nformaton t lkely carres and the larger context should be suppled to focus the search (Fnkelsten et al.. Ths mples that the consderaton set for the consumer s lkely to shrnk as the search term becomes narrower n scope. Danaher and Mullarkey (3 show that user nvolvement durng search (whether the use s n a purchasng or surfng mode plays a crucal role n the effectveness of onlne banner ads. Snce the consumers n our data get to see the ads dsplayed by all the retalers who are bddng for that keyword at the tme of the search, the probablty of a goal-drected consumer clckng on the retaler s advertsement decreases unless the retaler carres the specfc product that the consumer s searchng for. In contrast, a consumer who does not have a goal-drected search (has a wder consderaton set and s n the surfng mode, s lkely to clck on several advertsng lnks before she fnds a product that nduces a purchase. Some addtonal substantve results are as expected. Rank has an overall negatve relatonshp wth CTR n Table a. Ths mples that lower the rank of the advertsement (.e., hgher the locaton of the sponsored ad on the computer screen, hgher s the clck-through rate. The poston of the advertsement lnk on the search engne page clearly plays an mportant role n nfluencng clck-through rates. Ths knd of prmacy effect s consstent wth other emprcal studes of the onlne world. Ansar and Mela (3 suggested a postve relatonshp between the seral poston of a lnk n an emal and recpents' clcks on that lnk. Smlarly, Drèze and Zufryden ( mpled a postve relatonshp 6

18 between a lnk's seral poston and ste vsblty. Brooks ( showed that the hgher the lnk s placement n the results lstng, the more lkely a searcher s to select t. In the context of shoppng search engnes, Baye at al. (8 fnd that there s a 7.5% drop n clck-through rates when a retaler s move down one poston on the screen. Thus, ceters parbus, webste desgners and onlne advertsng managers would place ther most desrable lnks toward the top of a web page or emal and ther least desrable lnks toward the bottom of the web page or emal. A robustness test wheren we nclude a quadratc term for Rank hghlght that the negatve relatonshp between CTR and Rank ncreases at a decreasng rate. Ths fndng has useful mplcatons for managers nterested n quantfyng the mpact of Rank on CTR. When we consder the nteracton effect of these varables on the relatonshp of Rank wth clckthrough rates, we fnd that keywords that contan retaler-specfc or brand-specfc nformaton lead to an ncrease n the negatve relatonshp between Rank and CTR. That s, for keywords that contan retaler-specfc or brand-specfc nformaton, a lower rank (better placement leads to even hgher clck-through rates. On the other hand, we fnd that the coeffcent of Length s statstcally nsgnfcant suggestng that longer keywords do not seem to affect the negatve relatonshp between clck-through rates and ranks. As shown n Table b, the estmated unobserved heterogenety covarance s sgnfcant ncludng all of ts elements. Ths suggests that the baselne clck-through rates and the way that keyword rankng predcts the clck-through rates are dfferent across keywords, drven by unobserved factors beyond the three observed keyword characterstcs. Next consder Tables 3a and 3b wth fndngs on converson rates. Our analyss reveals that the coeffcent of Brand, δ, s negatve and sgnfcant ndcatng that keywords that contan nformaton specfc to a brand (ether product-specfc or manufacturer-specfc experence lower converson rates on an average. Specfcally, the presence of brand nformaton n the keyword decreases converson rates by.%. Smlarly, the presence of retaler nformaton n the keyword ncreases converson rates by 5.6%. In contrast, Length s not statstcally sgnfcant n ts overall effect on converson rates. We fnd a sgnfcant relatonshp between Rank and converson rates such that lower the Rank (.e., hgher the sponsored keyword on the screen, hgher s the Converson Rate. A decrease n the rank from the maxmum possble poston or worst case scenaro (whch s 3 n our data to the mnmum poston or best case scenaro (whch s n our data ncreases converson rates by 9.5%. Ths fndng can have an mportant mplcaton for exstng theoretcal models n the doman of sponsored search advertsng 7

19 whch have typcally assumed that the value per clck to an advertser s unform across all ranks. Our estmates suggest that the value per clck s not unform and motvates future theoretcal models that modfy the common assumpton of unformty n clck values and re-examne the socal welfare maxmzng propertes of generalzed second prce keyword auctons lke those n Google. The ncluson of a quadratc term for Rank hghlghts that the negatve relatonshp between Converson Rates and Rank ncreases at a decreasng rate. Ths fndng s relatvely new n the lterature on onlne advertsng. As speculated n trade press reports, our analyss emprcally confrms that Landng Page Qualty has a postve relatonshp wth conversaton rates. To be precse, an ncrease n landng page qualty score from the lowest possble score (equal to to the hghest possble score (equal to s assocated wth an ncrease n the converson rates by.5%. These analyses suggest that n terms of magntude, the rank of a keyword on the search engne has a larger mpact on converson rates than the qualty of the landng pages. When we consder the effect of these keyword characterstcs on the relatonshp of Rank wth Converson Rates, we fnd that none of the keyword attrbutes have a statstcally sgnfcant effect on the relatonshp between rank and converson rates. As shown n Table 3b, the estmated unobserved heterogenety covarance s sgnfcant ncludng all of ts elements. Ths suggests that the baselne converson rates and the way that keyword rankng predcts the converson rates are dfferent across keywords, drven by unobserved factors. = = Insert Tables 3a and 3b = = Next, we turn to frms behavor. Interestngly, the analyss of cost-per-clck reveals that there s a negatve relatonshp between CPC and Retaler, but a postve relatonshp between CPC and Brand. Ths mples that the frm ncurs a lower cost per clck for advertsements that contan retaler nformaton and hgher cost per clck for those advertsements that contan brand nformaton. Ths s consstent wth theoretcal predctons because Retaler keywords are far less compettve than Brand keywords, on an average. Whle Length does not have a drect statstcally sgnfcant effect on CPC, t ndrectly affects CPC through the nteracton wth Rank. There s a negatve and statstcally sgnfcant relatonshp between CPC and Landng Page Qualty, mplyng that advertsers tend to place lower bd prces on keywords that are lnked to landng pages wth hgher qualty. Further, there s a negatve relatonshp between CPC and Lag Rank such that a lower poston on the search engne results screen s assocated wth a lower cost per clck (and hence a lower bd prce. These results are ndcatve of the 8

20 fact that whle there s some learnng exhbted by the frm based on past performance metrcs, t may not necessarly be bddng n an optmal manner = = Insert Tables a and b = = Fnally, on the analyss of Rank, we fnd that all three covarates-retaler, Brand and Length have a statstcally sgnfcant and negatve relatonshp wth Rank, suggestng that the search keywords that have retaler-specfc nformaton or brand-specfc nformaton or are more specfc n ther scope generally tend to have lower ranks (.e., they are lsted hgher up on the search engne results screen. How do search engnes decde on the fnal rank? Anecdotal evdence and publc dsclosures by Google suggest that t ncorporates a performance crteron along wth bd prce when determnng the rankng of the advertsers. The advertser n the top poston mght be wllng to pay a hgher prce per clck than the advertser n the second poston, but there s no guarantee that ts ad wll be dsplayed n the frst slot. Ths s because past performance such as pror clck-through rates are factored n by Google before the fnal ranks are publshed. The coeffcents of CPC and Lag CTR are negatve and statstcally sgnfcant n our data. Thus, our results from the estmaton of the Rank equaton confrm that the search engne s ndeed ncorporatng both the current CPC bd and the prevous clck-through rates n determnng the fnal rank of a keyword. Note from Table 5a that the coeffcent of CPC s almost twce the coeffcent of Lag CTR, suggestng that current bd prce (CPC has a larger role to play n determnng the fnal rank than the qualty score related factors lke pror clck-through rates. = = Insert Tables 5a and 5b = = Fnally, t s worth notng n Table 6 that the unobserved covarance between ( clck-through propensty and keyword rank, ( between converson propensty and keyword rank, and ( between CPC and keyword rank all turn out to be statstcally sgnfcant. Ths suggests the endogenous nature of CPC and Rank. Therefore, t s mportant to smultaneously model the consumer s clck-through and purchase behavor, and the advertser s and search engne s decsons. = = Insert Table 6 = = As mentoned before, we provde the parameter estmates produced by the estmaton of ths system under the assumpton of dagonalty (restrctng covarance elements to be zero to the generalzed results. Refer tables n Appendx B. These estmates further demonstrate that t s mportant to control 9