smart devices in people s daily lives. The ubiquitous sensors embedded in pervasive smart devices incessantly generate

Size: px

Start display at page:

Download "smart devices in people s daily lives. The ubiquitous sensors embedded in pervasive smart devices incessantly generate"

Gerald Greer
5 years ago
Views:

1 486 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 Trading Data in the Crowd: Profit-Driven Data Aquisition for Mobile Crowdsensing Zhenzhe Zheng, Student Member, IEEE, Yanqing Peng, Fan Wu, Member, IEEE, Shaojie Tang, Member, IEEE, and Guihai Chen, Senior Member, IEEE Abstrat As a signifiant business paradigm, data trading has attrated inreasing attention. However, the study of data aquisition in data markets is still in its infany. Mobile rowdsensing has been reognized as an effiient and salable way to aquire large-sale data. Designing a pratial data aquisition sheme for rowd-sensed data markets has to onsider three major hallenges: rowd-sensed data trading format determination, profit maximization with polynomial omputational omplexity, and payment minimization in strategi environments. In this paper, we jointly onsider these design hallenges, and propose VENUS, whih is the first profit-driven data aquisition framework for rowd-sensed data markets. Speifially, VENUS onsists of two omplementary mehanisms: VENUS-PRO for profit maximization and VENUS-PAY for payment minimization. Given the expeted payment for eah of the data aquisition points, VENUS-PRO greedily selets the most ost-effiient data aquisition points to ahieve a sub-optimal profit. To determine the minimum payment for eah data aquisition point, we further design VENUS-PAY, whih is a data prourement aution in Bayesian setting. Our theoretial analysis shows that VENUS- PAY an ahieve both strategy-proofness and optimal expeted payment. We evaluate VENUS on a publi sensory data set, olleted by Intel Researh, Berkeley Laboratory. Our evaluation results show that VENUS-PRO approahes the optimal profit, and VENUS-PAY outperforms the anonial seond-prie reverse aution, in terms of total payment. Index Terms Data marketplae, mobile rowdsensing, aution theory. I. INTRODUCTION THE past few years have witnessed the proliferation of smart devies in people s daily lives. The ubiquitous sensors embedded in pervasive smart devies inessantly generate Manusript reeived April 30, 2016; revised September 30, 2016; aepted November 28, Date of publiation January 26, 2017; date of urrent version Marh 31, This work was supported in part by the State Key Development Program for Basi Researh of China (973 Projet) under Grant 2014CB340303, in part by the China NSF under Grant , Grant , Grant , Grant , Grant , and Grant , in part by the Shanghai Siene and Tehnology Fund under Grant , in part by the CCF-Tenent Open Fund, and in part by the Sientifi Researh Foundation for the Returned Overseas Chinese Sholars. The work of Z. Zheng was supported by a Google Ph.D. Fellowship and a Mirosoft Asia Ph.D. Fellowship. The work of S. Tang was supported by the China NSF under Grant Z. Zheng, Y. Peng, F. Wu, and G. Chen are with the Shanghai Key Laboratory of Salable Computing and Systems, Department of Computer Siene and Engineering, Shanghai Jiao Tong University, Shanghai , China ( zhengzhenzhe@sjtu.edu.n; yanqing.sjtu@ gmail.om; fwu@s.sjtu.edu.n; ghen@s.sjtu.edu.n). S. Tang is with the Department of Information Systems, The University of Texas at Dallas, Rihardson, TX USA ( tangshaojie@gmail.om). Color versions of one or more of the figures in this paper are available online at Digital Objet Identifier /JSAC tremendous volumes of sensed data by seamlessly monitoring a diverse range of human ativities and environment phenomena. However, urrently, most of operators exlusively analyze the olleted data for their own appliation purposes, whih introdues a serious barrier for the wide availability of rowd-sensed data, resulting in a number of isolated data islands. Reognizing the great benefit of data sharing [5], several open platforms, suh as Terbine [50], Thingful [51] and Thingspeak [52], have emerged to enable rowd-sensed data to be exhanged on the web, aiming to unlok the potential eonomi values underlying the rowd-sensed data. However, due to lak of effiient data aquisition sheme, the amounts of rowd-sensed data in these platforms are very limited, whih has signifiantly suppressed the inreasing market demand for data. The suess of data markets highly relies on the suffiient amounts of data for trading. On one hand, the data broker needs to aggregate various types of data from exogenous data soures to satisfy the diverse demand of data onsumers. On the other hand, the data broker has to periodially supply fresh data into data markets, beause the data beome less aurate, and even useless, when the ontextualized environments evolve over time. Mobile rowdsensing have been reognized as a highly effiient and salable way to ollet large-sale data [32], [58]. For example, Thingspeak [52] has reently launhed a rowdsensing platform to ollet rowd-sensed data. In mobile rowdsensing, the data providers onsume their own physial resoures, and spend manual effort in olleting data. Thus, the data broker should offer suffiient payments to inentivize data providers to ontribute data. The frugal data broker always wants to proure enough data with a minimum payment, whih an be formulated as the problem of payment minimization. In data markets, the ultimate goal of the data broker is to maximize profit, whih is defined as the differene between the revenue generated from selling data (possibly data-based servies) and the expenditure on data aquisition. Although the data broker an obtain a large revenue by providing high quality data servies, she has to disburse expensive expenditure to ollet enough data, suh that the data servies maintain at a high quality level. Therefore, in order to maximize profit, the data broker should make a trade-off between revenue and data aquisition expenditure, whih an be formulated as a profit maximization problem. Although a number of data aquisition mehanisms with different optimization objetives have been developed in the literature [10], [11], [32], [58], few of them onsidered the monetary profit produed by trading IEEE. Personal use is permitted, but republiation/redistribution requires IEEE permission. See for more information.

2 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 487 the rowd-sensed data in the market, whih deviates from the goal of the data broker. To fill this gap, we propose a profit-driven data aquisition mehanism. We summarize three major hallenges in designing a pratial profit-driven data aquisition mehanism for rowd-sensed data markets. The first design hallenge is to determine the rowdsensed data trading format, onsidering both the harateristis of rowd-sensed data and diverse market demand for data. To evaluate the profit of data in the market, we have to identify the speifi data trading format, whih is still an open problem in both eonomis and omputer siene ommunities. The rowd-sensed data is normally unertain and has omplex orrelation, making the rowd-sensed data quite different from the traditional information good [3], [35], and introduing additional diffiulties in determining the data trading format. On one hand, the rowd-sensed data is inomplete, impreise, and erroneous, making it improper to diretly feed raw data into the data market. On the other hand, the rowd-sensed data may be orrelated in multiple dimensions, and has rih semati information behind suh orrelation [10], resulting in that separately selling piees of data beomes meaningless. Furthermore, we should determine the data trading format aligned with the diverse market demand, meaning that data onsumers, from different market segments, would request for the rowd-sensed data with different quality levels. Enabling data onsumers to express their diverse market demands would inur a heavy burden of designing a onise and simple data trading format. Due to various types of unertain fators, omplex orrelation and diverse market demand for data, it is nontrivial to determine an appropriate and flexible rowdsensed data trading format. Yet, another design hallenge is the hardness of maximizing profit in a ompliated data market environment. From the definition, the value of profit rests on the attained revenue from data trading and the distributed expenditure on data aquisition. Due to the speial ost struture of data, 1 the pries of data should be linked to data onsumers valuations over the data, rather than the prodution ost. Thus, we an express the revenue with the data onsumers valuation distributions. However, it is hard to analyze the property of the revenue (and then the profit), beause the valuations always follow ompliated distributions in pratial market environments. Furthermore, even if we an figure out the maximum expeted revenue, finding the minimum expenditure on data aquisition an be proven to NP-Hard, and is normally omputationally intratable. Therefore, in order to approah the optimal profit of data trading, we have to overome the ompliated formats of valuation distributions and the high omputational omplexity in solving the problem of aquisition expenditure minimization. The last design hallenge is to simultaneously guarantee both strategy-proofness and minimum payment in data prourement autions. As ompetitive bidding leading to a lower disbursed payment, the data broker would ondut data prourement autions to determine minimized payments for 1 Data have a fixed prodution ost, and tend to indue negligible marginal osts for reprodution. data providers. Sine the data providers are rational and selfish, they always tend to misreport their private data olletion osts, if doing so an inrease their utilities. Suh a selfish behavior inevitably hurts the other data providers utilities, and signifiantly inrease the data broker s data aquisition expenditure. Therefore, a strategy-proof data prourement mehanism is desirable in suh strategi environment. However, it is extremely diffiult in simultaneously ahieving both strategyproofness and optimum in aution theory [2], [17], [44], [46]. In forward autions with Bayesian valuation setting, one of the mature tehniques to guarantee strategy-proofness and optimal revenue is to reserve the trading items in the instanes that all the bids are below a seleted reserve prie [39]. This reserve prie-based tehnique does not work in the ontext of data prourement autions, beause the data broker has to purhase one piee of data from data providers in all the instanes. New priing tehniques have to be developed to derive new theoretial results in prourement autions. The previous works have also proved some negative results about revenue maximization (payment minimization) in strategyproof aution design. In Bayesian valuation setting, Ronen and Saberi laimed that no deterministi polynomial time asending aution an ahieve an approximation ratio better than 3/4 in terms of revenue maximization [46]. When the osts of data providers are ompletely private, no strategyproof aution mehanism an give any performane guarantee on the payment [2], [17]. In this paper, by jointly onsidering the above three design hallenges, we ondut an in-depth study on the profitaware data aquisition design for rowd-sensed data markets. To probe in the benefit of model-based data trading format, we build a statistial model upon the raw data, to apture data unertainty and omplex orrelation among data. We regard the resulting statistial model as an information ommodity, and further partition the ommodity into multiple versions with different quality levels, to satisfy the diverse market demand of data onsumers. Seondly, we propose VENUS-PRO to deompose the problem of profit maximization into revenue maximization and data aquisition expenditure minimization. Given data onsumers valuation distributions, VENUS-PRO adopts a post-priing mehanism to determine the trading prie for eah version, and then alulates the maximum expeted revenue for eah version. Considering the high omputational omplexity for the optimum, VENUS-PRO obtains a sub-optimal data aquisition expenditure, assuming that the payments for data aquisition points are given in advane. Combing with the alulated revenue and data aquisition expenditure, VENUS-PRO ahieves a onstant approximation ratio in terms of profit maximization. Finally, to determine the minimum payment for eah of data aquisition points, we propose a strategy-proof and optimal data prourement aution mehanism, namely VENUS-PAY, in Bayesian setting, wherein the data providers osts are drawn from publilyknown probability distributions. We summarized our ontributions in this paper as follows. First, we present a system model, inluding a data trading model and a data purhasing model, for rowd-sensed data markets. For the data trading model, we build a joint

3 488 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 Fig. 1. Crowd-Sensed data market. probability distribution to apture data unertainty and omplex data orrelation, and adopt a versioning tehnique to satisfy the diverse market demand. We also model the data purhasing proess as a reverse aution in Bayesian environment. Seond, we design a profit-driven data aquisition mehanism, namely VENUS-PRO. Given the data onsumers valuation distributions and expeted payment of eah of data aquisition points, we propose a post-priing mehanism and a greedy data aquisition point seletion algorithm to ahieve a sub-optimal profit. Third, we further onsider the problem of payment minimization for data aquisition, and propose VENUS-PAY, whih is a data prourement aution in Bayesian setting, ahieving both strategy-proofness and optimal payment. Finally, we evaluate the performane of VENUS-PRO and VENUS-PAY based on a real-world publi sensed data set. Our evaluation results show that VENUS-PRO obtains a nearoptimal profit, while VENUS-PAY performs better than the lassial seond-prie reverse aution. The rest of this paper is organized as follows. In Setion II, we present a system model for the rowd-sensed data market. Given the expeted payment for eah data aquisition point, we design VENUS-PRO in Setion III. In Setion IV, we propose VENUS-PAY to determine the minimum payments of data aquisition points. We present our evaluation results in Setion V, and review the related work in Setion VI. Finally, we onlude the paper in Setion VII. II. PRELIMINARIES In this setion, we desribe data trading model and data purhasing model for profit-driven data aquisition in rowdsensed data markets. A. Data Trading Model As illustrated by Figure 1, in a rowd-sensed data market, the data broker wants to exhange the real-time information about the environment phenomena of a monitoring region for some profit, the data onsumers would like to pay for this information to failitate their data driven servies, and the data providers want to earn payments for their ontributed data. The data broker virtually deploys several Data Aquisition Points to approximately represent the phenomena of the monitoring region. The data aquisition points an be regarded as some kind of Points of Interest (PoIs), on whih the data broker wants to ollet neessary sensed data to train the real-time information. Aording to the required quality of servie (QoS) in speifi rowd-sensed appliations, the data broker an determine the quantity, density and physial loations of data aquisition points by exploiting some mahine learning tehniques, suh as ative learning [8]. Generally, if the data broker aims to extrat aurate knowledge of the monitoring region, she would deploy fine-grained data aquisition points, but at the same time she has to disburse high payments for purhasing raw data from data providers. For example, indoor loation servie provider needs high preise and multi-dimensional sensed data, suh as position, size, oordinates and orientation information of indoor landmark objetives, to reonstrut the map of indoor floor plan [15]. Due to the omplex indoor environment, the servie provider needs to deploy dense data aquisition points (at eah store entrane) to ollet suffiient sensed data, inurring a high data aquisition expenditure. In another senario, the government an simply deploy sparse data aquisition points along the major roads or around shopping malls to roughly profile the noise level of metropolis [41], [42]. We denote the L Data Aquisition Points by Y = {y 1, y 2,, y L }. We assoiate a disrete random variable X y for eah data aquisition point y Y, representing the possible measurements of the monitoring environment phenomena, and assoiate a set of disrete random variables X Y with a set of data aquisition points Y Y. We note that the random variables may be orrelated in multiple dimensions, e.g., the temperatures of geographially proximate loations are likely to hange synhronously [10]. We use the following major notations to define the data trading model. 1) Information Commodity: In the rowd-sensed data market, the information ommodity is the joint distribution of random variables X Y over T time slots. We do not restrit the joint distribution to any speifi format, suh that the joint distribution an apture different types of unertainty and omplex orrelation among data. The probability mass funtion p ( ) x Y assigns a probability for a possible valuation vetor x Y = (x 1, x 2,, x L ) to the random variables X Y. We an use historial data and expert knowledge to onstrut a rough prior probability mass funtion, and update it using the new observations from the data aquisition module. Suppose that we observe values x O for the seleted random variables X O X Y, we an use Bayes rule to ondition our joint probability mass funtion p(x Y ) on these observations: p(x Y X O = x O ) = p(x Y, x O), (1) p(x O ) where X Y = X Y \X O is the set of unobserved random variables. The posterior distribution is more ertain than the prior distribution. Here, we use entropy to quantify the unertainty of a distribution, onsidering its onise expression and nie properties, e.g., monotoniity and submodularity. 2 Speifially, the onditional entropy of the unobserved normal random 2 We an also use some other alternative metris, suh as Kullbak-Leibler divergene and Mutual Information [9], to quantify the unertainty of the distributions. However, Kullbak-Leibler divergene is too omplex to be used in data trading, and mutual information does not satisfy neither monotoniity nor submodularity, whih signifiantly inreases the omplexity of profit maximization.

4 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 489 variables X Y, after observing the seleted random variables X O is: H (Y O) = p(x Y, x O ) log p(x Y x O ). (2) x Y dom X Y x O dom X O As time passes by, our belief about the observations of the random variables X O will be spread out, inreasing the unertainty of the distribution. Thus, the data broker has to periodially ollet new observations to maintain the entropy of the information ommodity at a low level. 2) Version: In the rowd-sensed data market, data onsumers may have diverse quality requirements over the information ommodity, resulting in different valuations for the information ommodity. To satisfy the diverse quality requirements of data onsumers, the data broker would launh multiple versions for the information ommodity, where a version is a posterior distribution with some seleted observation random variables. We define the quality of the version p(x Y x O ) as a funtion of its onditional entropy H (Y O): Q 1 H (Y O) H (Y ) = H (O) H (Y ). The seond part of the equation holds given the property H (Y O) = H (Y ) H (O). From this definition, the quality of the version p(x Y x O ) is diretly proportional to the joint entropy of the seleted random variables H (X O ), implying that the version will have a high quality if we hoose the random variables with large joint entropy to observe. Generally, the data onsumers with various quality requirements would have different willingness to pay. In order to extrat revenue from these heterogeneous data onsumers, the data broker leverages the tehnique of differential priing [54] by selling the different versions of an information ommodity at different pries. By onduting the standard market tehnique, suh as survey, the data broker an determine the K andidate versions and a quality vetor Q = (Q 1, Q 2,, Q K ),where Q i < Q k, for all 1 i < k K. 3 We define the quality gap between two suessive versions k and k + 1 as: k Q k+1 Q k, and denote the minimum value of all the s by min min k { k }. In pratie, the quality gap would be large enough to distinguish two suessive versions, and we assume that min > max i H ({i})/h (Y ). We adopt the postpriing mehanism for data trading beause of its onveniene and popularity in pratie. In the ontext of data trading, the post-priing mehanism has several advantages ompared with other trading formats, suh as aution mehanisms. For example, the post-priing mehanism an guarantee the robust eonomi properties, suh as strategy-proofness, and handle the dynamial features of markets in a onise way. The data broker only has to determine a take-it-or-leave prie, whih is independent on the valuations and arrival sequene of data onsumers. Speifially, the data broker assigns a prie p k for the kth version, and denote the prie menu for 3 The determination of the number of versions and the orresponding quality for eah version is beyond the sope of this paper. Several previous works [4], [43], [53], [54] shed light on possible solutions for this problem. all the K versions by p = (p 1, p 2,, p K ). We disuss the determination of the optimal prie for eah version in Setion III. 3) Data Consumers: There are N single-minded data onsumers in the data market. Eah data onsumer is interested in only one version of the information ommodity, and has a valuation over this version. 4 We assume that the version preferene of the data onsumers follows a distribution with a probability mass funtion g(k), meaning that the data onsumers have a probability g(k) to hoose the kth version as her interested version. Therefore, there are N k = N g(k) data onsumers, who are interested in the kth version, in expetation. For the kth version, we assume that the valuations of the N k data onsumers are drawn from a distribution with umulative distribution funtion V k (x). We denote the vetor of all valuation distributions by V = (V 1 ( ), V 2 ( ),, V K ( )). By learning the historial transations, the data broker an obtain the knowledge of the distribution g(k) and the vetor V. This assumption is also alled as Bayesian assumption in eonomi literature [29]. 4) Revenue: If the data onsumer s valuation is larger than the trading prie of the kth version p k, then she would purhase this version. In this ase, the data broker would reeive a revenue of p k. The expeted revenue of selling the kth version to the N k data onsumers is: r k = N g(k) (1 V k (p k )) p k. (3) When the data broker reates the kth version, despite of obtaining the revenue r k, she an also extrat revenue r i from eah of the version 1 i < k lower than k. This is beause the data broker an degrade a high quality version to lower ones without induing additional osts, e.g., simply adding artifiial noises or using less observations. Thus, the expeted umulative revenue of the kth version should be R k = k r i. B. Data Purhasing Model Sine the olleted data beomes less aurate over time, the data broker has to periodially supply fresh data into the market. In the data trading model, the data broker would selet different sets of data aquisition points to generate different versions of the information ommodity. For eah seleted data aquisition point in one speifi time slot, if the previous observation has been expired, meaning that it is not aurate enough to represent the environment phenomena, the data broker would purhase one new observation from a pool of ative data providers. Aording to the freshness of the urrent observation and the auray requirement of eah data aquisition point, the data broker determines the time slots to launh the data purhasing proess for eah data aquisition point. Thus, we an assume that the data olletion proedures for different data aquisition points in different time slots are independent, so we fous on the data purhasing proess for one data aquisition point in one speifi time 4 We initialize the examination of data trading model with a simple purhasing behaviour model for data onsumers, and leave more omplex models to our future works.

5 490 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 slot in the following disussion. 5 We model the proess of data purhasing as a single-item data prourement aution, in whih the data broker, also alled as an autioneer or a buyer, wants to buy one piee of data from m y ompetitive data providers, also alled as suppliers or sellers, for the data aquisition point y O in one speifi time slot. 6 The item being autioned is the right to supply data, whih an be onsidered as one kind of sare resoure. Aution mehanism is believed to be an effetive way to alloate the sare resoure, beause in prourement aution, the data broker an disover the true olletion ost of data providers, and exploit the ompetition among the data providers to redue the prourement payment. In a diret-revelation data prourement aution, the data providers simultaneously delare their bids to the data broker, who thereafter makes a deision on winner determination and payment to winner. We use some useful notations to define the data prourement aution model. 1) Data Provider: We denote the data providers by M y = {1, 2,, m y }. Eah data provider i M y has a data olletion ost i, whih is private information to her, and is known as type in mehanism design. We onsider a Bayesian setting, in whih ost i is drawn from a publily-known distribution F i (x) with a density funtion f i (x) in the range [ i, i ].Let F = (F 1 ( ), F 2 ( ),, F m y ( )) denote the ost distributions of all data providers. We assume that the ost distributions are independent, but is not neessary to be idential. Eah data provider i M y delares a bid b i to the data broker, meaning that she requests for a ompensation of at least b i to over her ost. Sine data providers are rational and selfish, they may not truthfully reveal their osts, i.e., the bids may not neessarily be equal to the ost i. We denote the ost and bidding profile of all data providers by = ( 1, 2,, m y ) and b = (b 1, b 2,, b m y ), respetively. After olleting bidding profile, the autioneer selets a winner, and determines payments for data providers. That is, a data prourement aution has two major omponents: Seletion Rule: Choose a feasible seletion rule x(b) = (x 1 (b), x 2 (b),, x m y (b)) as a funtion of bidding profile b. x i (b) = 1 if data provider i is the winner; otherwise x i (b) = 0. Payment Rule: Determine a payment vetor w(b) = (w 1 (b), w 2 (b),,w m y (b)), wherew i (b) is the payment for the data provider i when the bidding profile is b. Data provider i M y has a quasi-linear utility u i (b) on the bidding profile b, whih is defined as the differene between payment and ost: u i (b) w i (b) i x i (b). 2) Expeted Payment: The data broker determines T time slots to ollet data for the data aquisition point y Y,and 5 Considering the dependene among data purhasing proesses in temporal dimensions would signifiantly inrease the omplexity of designing the data prourement aution. In dynami data aquisition senario, we should model the interdependent data proesses as repeated prourement autions or generalized online autions, in whih the data providers an partiipate in the data purhasing proess in suessive time slots. As the data providers an repeatedly interat with the data broker, there are more strategi behaviours for data providers to manipulate the aution to further inrease their long-term utilities [1]. We will relax this assumption in our future work. 6 Our results an be extended to more flexible aution formats, e.g., multiunit autions or ombinatorial autions, adapting to the senario, where one data aquisition point needs multiple observations to guarantee fault tolerane. the expeted aumulated payment is: [ m y ] y = T E F w i (), (4) where F means that the expetation is over the ost distributions. We use ={ y y Y } to denote the expeted payments of all data aquisition points Y. C. Problem Statement In data market, the data broker faes two losely relevant optimization problems: Profit Maximization and Payment Minimization. We formulate these two problems as follows. 1) Profit Maximization: Although the data broker an obtain large revenue by launhing a version with high quality, at the same time, she has to disburse expensive expenditure to selet more data aquisition points. Hene, the data broker prefers to selet the profitable and heap data aquisition points O to observe, suh that the obtained profit (O) is maximum. We define the profit as the differene between the obtained revenue and the disbursed expenditure: (O) R(O) S(O), (5) where R(O) is the revenue generated by the seleted data aquisition points O, ands(o) is the data aquisition expenditure for the data aquisition points O, i.e., S(O) = y O y. We note that R(O) is equal to the revenue R k of the version k, whih is the highest version that the seleted points O an reah, i.e., k arg max k {(H (O)/H (Y )) Q k }. We an state the problem of profit maximization as: seleting a subset of data aquisition points O, suh that the obtained profit is maximized, i.e., O = arg max O Y (R(O) S(O)). In profit maximization, we assume that the expeted payment y for eah data aquisition point y Y is known in advane, and determine this expeted payment in payment minimization module. Given the minimum payment for data aquisition points and the optimal revenue for eah version, the profit maximization problem is atually a data aquisition point seletion problem. 2) Payment Minimization: For eah of data aquisition points, the frugal data broker always wants to purhase one piee of data with the lowest payment. In Bayesian environment, the data broker intends to design a data prourement aution that ahieves the lowest expeted payment y,where the expetation is with respet to the ost distributions F. The problems of profit maximization and payment minimization are losely related. On one hand, the output of payment minimization is the input of profit maximization. The minimum expeted payment on eah of data aquisition points, determined by the payment minimization proedure, diretly affets the data aquisition point s probability of being seleted in the profit maximization subroutine. On the other hand, the result of profit maximization, i.e., the seleted data aquisition points, also has an impat on payment minimization. The solution to profit maximization is to make a trade-off between the revenue extrated from data trading and expenditure for data aquisition. The intuitive idea behind the solution is to selet the data aquisition points with a lower payment and a

6 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 491 high revenue ontribution to final revenue. Under this seletion rule, the data aquisition points with a large payment and a signifiantly high ontribution would also have hanes to be seleted. In the long term, the data aquisition points with a large expeted payment would attrat more data providers, inreasing the ompetition among bidders, and thus dereasing the payment in the end. III. VENUS-PRO In this setion, we propose a profit-driven data aquisition mehanism, namely VENUS-PRO, for profit maximization in the rowd-sensed data market. A. Detailed Design VENUS-PRO onsists of two omponents: Revenue Determination and Aquisition Expenditure Calulation. 1) Revenue Determination: In ontrast to physial goods, information ommodity, regarded as one kind of digital goods, has a different ost struture, i.e., a fixed ost of prodution, e.g., a substantial data aquisition expenditure, but negligible marginal osts, i.e., a lower ost of produing an additional opy. Under this speial ost struture, the pries of the information ommodity should be linked to the valuations of data onsumers rather than the prodution osts. Thus, given the data onsumers valuation distribution of the kth version: V k (x), the expeted revenue of the kth version with a trading prie p is: (1 V k (p)) p. Therefore, the data broker an set the optimal prie p k for the kth version by p k arg max [(1 V k (p)) p]. p Under this optimal priing strategy, the expeted revenue for the kth version is: k k R k = r i = [N g(i) (1 V i (p i )) p i ]. (6) Given the optimal revenue for eah version and the minimum payment for eah of data aquisition points, the profit maximization problem is atually a data aquisition point seletion problem: seleting a subset of data aquisition points O, suh that the obtained profit is maximized, i.e., O = arg max O Y (R(O) S(O)). Considering that the version preferene distribution g(i) and the valuation distributions V i (p) an be quite ompliated in the pratial data market, the speifi format of the revenue R(O) (or R k )(and then the profit (O)) has unlear properties, making it diffiult to diretly solve the above profit maximization problem by adopting the lassial optimization problem. In the following, we overome this issue by transforming profit maximization problem to a solvable expenditure minimization problem, and designing an approximation algorithm for it. 2) Aquisition Expenditure Calulation: In the pratial market, the ommodity normally has a onstant number of versions as a result of the trade-off between effiieny and omplexity [47], [53]. It is obvious that the optimal number of the version for an information ommodity is equal to the number of types of data onsumers in the data market. Data onsumers from different market segments have signifiantly different valuations for one data set, beause they use the data set in diverse appliation senarios. As the possible appliations for one data set may be large, the data broker may not have a lear idea of the exat number of types of data onsumers. For the market with no obvious market segments, some previous works [20], [48] in marking suggest that the optimal number of versions is three: a high-end version, a middle version, and a low-end version, whih are also alled as Goldiloks priing in the literature. Furthermore, the maximum number of versions in pratial data marketplaes is always small, e.g., in Windows Azure Data Marketplae [56], the maximum number versions is no more than ten, and in Quandl [45], a finanial and eonomi data trading platform, the data vendor offers most of data sets with four versions. Under this observation, our basi idea to solve the profit maximization is to enumerate the profit of eah version and selet the maximum one as the result. Speifially, in order to alulate the profit of the kth version, by the definition of profit, we should know the possible highest revenue R k and its orresponding lowest data aquisition expenditure S(O). Although we an exatly alulate the highest revenue R k by Equation (6), it is nontrivial to figure out the minimum data aquisition expenditure S(O). We now formulate the problem of expenditure minimization for the kth version as follows. Problem: Expenditure Minimization for the kth version Objetive: Minimize S(O) Subjet to: H (O) H (Y ) Q k, O Y. Here, the data broker attempts to selet a set of aquisition points O with the lowest aquisition expenditure, to satisfy the quality requirement of the kth version. Unfortunately, the problem of expenditure minimization an be proven to NP-Hard by reduing from the general set overing problem [57]. Considering the omputational intratability of the expenditure minimization problem, we present an alternative solution with a greedy aquisition points seletion algorithm, to ahieve a near-optimal expenditure in polynomial time. To this end, we take the advantage of the submodularity of entropy funtion H ( ). We first give the definition of submodular funtion. Definition 1 (Submodular Funtion): Let X be a finite set. A funtion f : 2 X R is a submodular funtion if for any A B X and x X\B: f (A {x}) f (A) f (B {x}) f (B). We show the entropy H ( ) is submodular and non-dereasing. Lemma 1: The entropy funtion H : 2 Y R is submodular, non-dereasing, and non-negative. Proof: To prove the submodularity, we first introdue an interesting property of entropy: the information never hurts priniple [9]: H (y O) H (y), foranyy Y and O Y, i.e., in expetation, observing the random variables X O annot inrease the unertainty about the random variable X y. Sine the marginal entropy inrease an be expressed as H (y O) H (O) = H (y O), the submodularity of the entropy

7 492 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 Algorithm 1 Greedy Data Aquisition Point Seletion Input: A set of data aquisition points Y, asetof expeted payments, the quality Q k of the kth version. Output: The set of seleted random variables O T. 1 t 0; O 0 ; 2 do 3 t t + 1; 4 π t arg min y Y\Ot 1 { 5 O t O t 1 {π t }; 6 while H(O t ) H(Y) < Q k and t < L; 7 T t; 8 return O T ; y H(y O t 1 ) funtion is simply the onsequene of the information never hurt priniple: for any set O O Y,wehave: } ; H (y O) H (O) = H (y O) H (y O ) = H (y O ) H (O ). Contrary to the differential entropy, whih an be negative, in the disrete ase, the entropy is non-negative, i.e., H (y O) H (y) = H (y O) 0 for any set O Y.Furthermore, H ( ) = 0. This demonstrates that the entropy funtion H ( ) is non-negative and non-degreasing. By Lemma 1, the expenditure minimization is a submodular set overing problem. Greedy approah is a nature fit for submodular optimization [34], [57]. One nature greedy rule is to selet the most ost-effiient aquisition point in eah iteration, i.e., the aquisition point with a lower expeted payment and high marginal entropy. This simple and effiient heuristi rule has been widely adopted in the other variations of submodular set overing problems [14], [16], [55], [57]. Other greedy naive greedy rules, suh as keep hoosing the aquisition points with minimum expeted payment, or keep hoosing the aquisition points with the highest marginal entropy an experiene arbitrary bad results in some extreme ases. We now desribe the greedy data aquisition point seletion algorithm for the problem of expenditure minimization in Algorithm 1. We use O t to denote the set of seleted aquisition points until the tth iterations, and initialize O 0 to be. Inthetth iteration, the data broker will selet the data aquisition y Y \O t 1 that has the smallest y /H (y O t 1 ), where H (y O t 1 ) = H (y O t 1 ) H (O t 1 ) represents the marginal entropy of the aquisition point y, given the urrently seleted aquisition points O t 1. 7 Let π t denote the aquisition point seleted in the tth iteration. This seletion proess iterates until the normalized joint entropy of the seleted aquisition points H (O t )/H (Y ) is higher than the quality of the kth version Q k, or there are no more aquisition points to selet (Lines 2 to 6). Algorithm 1 outputs the set of T aquisition points O T as the result. Sine we have to hek eah unseleted aquisition point in eah iteration, and there 7 In large sale mobile rowdsensing systems, it is hard to ompute the exat onditional entropy. We an adopt sampling approahes [33] to effiiently alulate suh approximate onditional entropy within a tolerant error bound. Algorithm 2 VENUS-PRO for Profit Maximization Input: A vetor of valuation distributions V, a version preferene distribution g(k), a set of random variables Y, a set of expeted payments, a quality vetor Q. Output: A pair of profit and seleted version (, k ). 1 0; k 0; 2 for k = 1 to K do 3 p k arg max p [(1 V k (p)) p]; 4 for k = 1 to K do 5 R k k [N g(i) (1 V i (p i )) p i ]; 6 O k GDY_ALG(Y,,Q k ); 7 S(O k ) y O k y ; 8 k R k S(O k ); 9 if k > then 10 k ; k k; 11 return (, k ); are at most L iterations, the time omplexity of Algorithm 1 is O(L 2 ),wherel is the number of aquisition points. Combining revenue alulation with aqusition expenditure determination, we desribe the detailed steps of VENUS-PRO in Algorithm 2. VENUS-PRO first alulates the optimal trading prie for eah version (Lines 2 to 3). Using this optimal trading prie strategy, VENUS-PRO an figure out the maximum expeted revenue R k for eah version k (Line 5). Although VENUS-PRO annot obtain the minimum expenditure, it an get the approximate one by running Algorithm 1 (GDY_ALG for short) (Lines 6 to 7). Upon obtaining the expeted revenue R k and the approximate expenditure S(O k ), VENUS-PRO an alulate the approximate profit k = R k S(O k ) for eah version k. Among these K andidate versions, VENUS-PRO hooses the one with the maximum approximate profit as the final result (Lines 9 to 10). Sine VENUS-PRO alls GDY_ALG algorithm K times, the omputational omplexity of VENUS-PRO is O(KL 2 ),wherek is the number of andidate versions. B. Analysis In this setion, we analyze the approximation ratio of VENUS-PRO. We first present the performane guarantee of the greedy algorithm (i.e., Algorithm 1) for the problem of data aquisition expenditure minimization. Theorem 1: We use O to denote the optimal set of aquisition points for the problem of aquisition expenditure minimization. If Algorithm 1 is applied, we are guaranteed to obtain: S(O T ) S(O ) 1 + log e β, H (Y ) H ( ) where β = min{β 1,β 2,β 3 }, and β 1 = H (Y ) H (O T 1 ), β 2 = { } max H (y O 0 ) y OT,1 t T H (y O t ) : H (y O t )>0, β 3 = θ T θ. The funtion H ( ) is defined by H (O) min{h (O), H (Y ) Q k }, and 1 θ t = π t H (π t O t 1 ).

8 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 493 Proof: Designing greedy algorithms with good approximation fators for the problem of submodular set overing have been widely studied in submodular optimization literature [21], [22], [34], [57]. We an redue the problem of aquisition expenditure minimization { to a speial submodular overing maximization: min O Y S(O) : H (O) = H (Y ) }, by introduing a new submodular and non-dereasing funtion H (O) = min{h (O), H (Y ) Q k }. It is easy to hek that suh speial submodular overing formulation is equivalent to the original formulation desribed in Setion III-A. Wolsey analyzed the approximation ratio of the greedy algorithm for this speial submodular overing maximization problem in [57]. Thus, using the similar analysis tehnique, we an immediately obtain the approximation ratio of Algorithm 1 for the problem of aquisition expenditure minimization. Before presenting the approximation ratio of VENUS-PRO, we show a useful lemma. Lemma 2: In the expenditure minimization for the version k, the optimal solution O satisfies Q k H (O )/ H (Y )<Q k+1. Proof: We first show that for any random variable i O,wehaveH(O i )/H (Y )<Q k,whereo i = O \{i}. If this inequality does not hold, i.e., H (O i )/H (Y ) Q k, we an get a better solution O i for the problem of expenditure minimization. This is beause O i is a feasible solution when H (O i )/H (Y ) Q k, and the expenditure of random variables O i is ertainly less than that of O i.e., S(O i ) S(O ). Therefore, O i is better than the optimal solution O,whih makes a ontradition. We now prove H (O )/H (Y ) < Q k+1 by ontradition. Assume that H (O )/H (Y ) Q k+1. On one hand, ombining with H (O i )/H (Y )<Q k,wehave: H (O ) H (O i ) > Q k+1 Q k = k min > H ({i}) H (Y ) H (Y ). On the other hand, aording to the submodularity of entropy funtion, we have: H (O ) H (O i ) H ({i}) H ( ) = H ({i}). Here, we get a ontradition, and thus H (O )/H (Y )<Q k+1. It is obvious that Q k H (O )/H (Y ). Therefore, we an onlude that Q k H (O )/H (Y )<Q k+1. We now show that VENUS-PRO an ahieve sub-optimal profit for eah version. We use k and k to denote the optimal profit and the approximate profit of the version k, respetively. It is obvious that R k S(O ), and we further assume that the revenue should be larger than the approximate expenditure, i.e., R k (1 + log e β k )S(O ) S(O T ).Otherwise, the data broker may get negative approximate profit, and she would not sell the information ommodity. Lemma 3: For eah version k, we have the following relation between the optimal profit and the approximate profit: k k ξ k 1 ξ k 1 log e β k, where ξ k denote the ratio between the revenue and the expenditure, i.e., ξ k = R k /S(O ), and ξ k 1 + log e β k. Proof: By Lemma 2, the highest version that the optimal solution O an ahieve is exatly the version k. By the definition of profit, we have: Furthermore, by Theorem 1, we have: k = R k S(O ). (7) k = R k S(O T ) R k (1 + log e β k )S(O ). (8) Together with Equations (7) and (8), we have: k k R k S(O ) R k (1 + log e β k )S(O ) ξ k 1. ξ k 1 log e β k Based on Lemma 3, We now present the approximation ratio of VENUS-PRO. Theorem 2: For the profit maximization, { the approximation ξ fator of VENUS-PRO is max k 1 k ξ k 1 log e β k }. Proof: We use APX and OPT to denote the profit obtained by VENUS-PRO and the optimal solution, respetively. VENUS-PRO selets the maximum approximate profit as the final result, i.e., APX = max k { k }, and the optimal profit is the maximum profit of all versions, i.e., OPT = max k { k }. Using Lemma 3, we immediately have: OPT APX = max k k { } k ξ k 1 max max. max k k k k k ξ k 1 log e β k IV. VENUS-PAY In VENUS-PRO, we assumed that the expeted payment for eah aquisition point is known. In this setion, we determine this payment by designing an optimal and strategyproof data prourement aution, namely VENUS-PAY. We first briefly review related solution onepts used in this setion from game theory. Seondly, we prove a useful theorem in the ontext of prourement autions: expeted payment is equal to expeted virtual soial welfare, extending the main results of seminal Myerson s work [39]. Thirdly, ombining this theorem with the knowledge of ost distributions, we alulate the value of expeted minimum payment before onduting the data prourement aution, and regard suh obtained payments as the inputs of VENUS-PRO. Finally, we designed VENUS-PAY to realize this expeted minimum payment. A. Solution Conepts A strong solution onept from game theory is dominant strategy. Definition 2 (Dominant Strategy [13]): Strategy s i is player i s dominant strategy, if for any strategy s i = s i and any other players strategy profile s i : u i (s i, s i ) u i (s i, s i ). The onept of dominant strategy is the basis of inentiveompatibility (IC), whih means that there is no inentive for any player to lie about her private information, and thus revealing truthful information is the dominant strategy for every player. An aompanying onept is individualrationality (IR), whih means that every player partiipating in the game expets to gain no less utility than staying outside. As the utility of not partiipating in the game is normally zero, the individual-rationality requires that the utility of eah player

9 494 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 should be non-negative. We now an introdue the definition of Strategy-Proof Mehanism. Definition 3 (Strategy-Proof Mehanism [38]): A mehanism is strategy-proof when it satisfies both inentiveompatibility and individual-rationality. Aording to Myerson s theorem [39], a single-parameter prourement aution, in whih bidders have single private information, i.e., data olletion ost in this paper, is strategyproof when its seletion rule is monotone. Theorem 3 (Myerson s Theorem [39]): A single parameter prourement aution is strategy-proof if and only if: Monotone Seletion: A seletion rule x is monotone if for every bidder i and bids b i by the other bidders, the seletion rule x i (b i, b i ) to i is non-inreasing in its bid b i. In the seminal paper [39], Myerson proved that for strategyproof prourement autions, the monotone seletion rule implies a unique payment alulation rule: w i (b i, b i ) = z d b i dz x i(z, b i ) dz. (9) Here, we assume that x i (z, b i ) is differentiable. 8 Aording to Theorem 3, we an transform the design of a strategy-proof mehanism, guaranteeing the properties of inentive-ompatibility (IC) and individual-rationality (IR) to the searh for the monotone seletion rule with good performane guarantee. Another standard solution onept from game theory is Nash Equilibrium (NE). A strategy profile s is a Nash Equilibrium of a game, if for any player i and any strategy s i = si, u i (s i, s i ) u i(s i, s i ). However, NE does not provide an ideal solution to the problem of data prourement. There are two reasons: (1) NE is not a very strong solution onept. Speifially, when in NE, the game player has inentives to keep her equilibrium strategy only under the assumption that all the other players are also keeping their equilibrium strategies. Without this assumption, NE no longer provides inentives for game player. (2) More importantly, NE usually has no guarantee on system performane, whih means that the system performane is not optimized. When the system onverge to one of NEs, the orresponding performane, suh as soial welfare or revenue, may be lower. In ontrast to NE, the Dominant Strategy Equilibrium (DSE) or the strategyproofness ensures every player has inentive to use the equilibrium strategy, regardless of the other players strategies. We show that when the data prourement aution onverge to DSE, the optimal expeted payment is also ahieved. Thus, the DSE-based aution mehanism is more desirable than the above NE-based mehanism in data aquisition proess. B. Design Rationale The data prourement aution, in whih the data broker wants to purhase one observation from a pool of ompetitive data providers with a minimum expeted payment, an be onsidered as a reversed version of the Myerson aution [39]. 8 Atually, by standard advaned alulus, the same formula holds for an arbitrary monotone seletion funtion, inluding pieewise onstant funtion, for a suitable interpretation of the derivative and the orresponding integral. The main result in Myerson aution (i.e., maximizing expeted revenue an be redued to maximizing expeted virtual soial welfare) ollapses in our ontext of data prourement aution. We theoretially prove a powerful theorem: minimizing expeted payment is equal to minimizing expeted virtual soial welfare. This theorem is the basis of designing the reverse aution with the goal of payment minimization. Before proving this theorem, we formally define virtual ost in data prourement autions. Definition 4 (Virtual Cost): In a data prourement aution, the virtual ost of the data provider i M y with the ost i drawn from F i is defined as: ϕ i ( i ) i + F i ( i ) f i ( i ). (10) As the previous works [32], [39], we assume that the ost distribution F i ( ) is regular, i.e., the virtual ost funtion ϕ i ( i ) is a stritly inreasing funtion, whih is met by most of the distribution funtions, suh as uniform distributions, exponential distributions, and lognormal distributions. We prove our main result for data prourement aution. Theorem 4: In a data prourement aution, the expeted payment is equal to the expeted virtual soial welfare, i.e., [ m y ] [ m y ] E F w i () = E F ϕ i ( i ) x i (). Proof: We fix the osts of the other data providers as i, and onsider the expeted payment of the provider i: E i F i [w i ()] = = = = w i () f i ( i ) d i [ z d ] i dz x i(z, i ) dz f i ( i ) d i [ ] z f i ( i ) d i z d dz x i(z, i ) dz F i (z) z d dz x i(z, i ) dz (11) In the first equation, we exploit the independene of ost distributions, i.e., thefixedvalueof i has no impat on the distribution F i. The seond equation omes from the Myerson s payment formula (9). We reverse the integration order in the third equation. We adopt the method of integration by parts to make the integral a more interpretable form. (11) = F i (z) z x i (z, i ) }{{} = = + =0 0 x i (z, i ) (F i (z) + zf i (z)) dz ) x i (z, i ) f i (z) dz ( z + F i (z) f i (z) ϕ i ( i ) x i ( i, i ) f i () d = E i F i [ϕ i ( i ) x i ()]. Finally, we have the equation for every bidder i and a ost vetor i : E i F i [w i ()] = E i F i [ϕ i ( i ) x i ()].

10 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 495 We reall that i is a vetor of aquisition points, and we take the expetation, with respet to i, of both sides of this equation to obtain: E F [w i ()] = E F [ϕ i ( i ) x i ()]. Applying linearity of expetations twie, we an get: [ m y ] m y E F w i () = E F [w i ()] m y = E F [ϕ i ( i ) x i ()] [ m y ] = E F ϕ i ( i ) x i (). We refer to m y ϕ i( i ) x i () as the virtual soial welfare with a ost profile. Thus, we have proved our laim. By Theorem 4, we will always hoose the data provider with the lowest virtual ost as the winner. The minimum expeted payment for the data aquisition point y Y an be expressed as: y = T E F [min i ϕ i ( i )]. We an alulate this optimal payment with the knowledge of ost distributions F. For easy illustration, we introdue some notations. Let ϕ min () denote the minimum virtual ost for a given ost profile, i.e., ϕ min () min i ϕ i ( i ). We denote the umulative distribution funtion of ϕ i ( i ) by G i (z), whih an be derived from the ost distribution F i (x), i.e., G i (z) = F i (ϕi 1 (z)). LetG min (z) denote the umulative distribution funtion of ϕ min (): G min (z) = Pr(ϕ min () z) = 1 Pr(ϕ min () >z) m y = 1 Pr(ϕ i ( i )>z) m y = 1 (1 G i (z)). The payment y for eah aquisition point y Y is: y = T E ϕmin () G min [ϕ min ()] ϕ = T z g min (z) dz ϕ ( ) ϕ = T ϕ G min (z) dz, (12) where g min (z) is the probability density funtion of the random variable ϕ min (), and the range [ϕ, ϕ] is the support of the random variable ϕ min (), ϕ min i ϕ i ( i ) and ϕ max i ϕ i ( i ). We adopt the method of integration by parts in the last part of Equation (12). How should we design a seletion rule x to realize this optimal payment? We have no ontrol over the ost distributions F or the virtual ost funtions ϕ i ( i ), so the natural approah is to design the seletion rule x(), suh that the ahieved virtual soial welfare is minimum for every possible input ost profile. With this observation, we design an optimal prourement aution in next subsetion. ϕ Algorithm 3 VENUS-PAY for Payment Minimization Input: The number of data providers m y, a bidding profile b, a vetor of ost distributions F, a vetor of orresponding probability density funtion f. Output: A pair of seletion result and payment result (x(b), w(b)). 1 x(b) 0; w(b) 0; 2 // Winner Seletion 3 for i = 1 to m y do 4 ϕ i (b i ) b i + F i (b i ) f i (b i ) ; 5 i arg min i ϕ i (b i ); 6 x i (b) 1; 7 // Payment Calulation 8 î arg min i =i ϕ i (b i ); ( 9 w i (b) min ϕi 1 (ϕî(bî)), i 10 return (x(b), w(b)); C. Detailed Design VENUS-PAY onsists of two major omponents: Winner Seletion and Payment Calulation. We depit the pseudo-ode of VENUS-PAY in Algorithm 3. 1) Winner Seletion: After olleting the bids b, the data broker alulates the virtual bid for eah data provider by Equation (10) (Lines 3 to 4). Based on Theorem 4, minimizing the expeted payment is equal to minimizing the expeted virtual soial welfare. Thus, in order to minimize the expeted payment, the data broker hooses the data provider with the lowest virtual bid as the winner, i.e., x i (b) = 1 for i = arg min i {ϕ i (b i )} (Lines 5 to 6). We note that seleting the data provider with the lowest bid does not lead to an optimal data prourement aution. We break the tie following any bid-independent rule, e.g., the lexiographi order of data provider s ID. Due to the regularity of ost distributions, this winner seletion rule is monotone. Lemma 4: The seletion rule in VENUS-PAY is monotone. Proof: To prove the monotoniity of the seletion rule, we have to show that any winning data provider i will still be seleted as a winner when she dereases her ost, i i. Sine the ost distribution is regular, i.e., the virtual ost funtion is a stritly inreasing funtion, the virtual ost of i will not be larger than that of i, i.e., ϕ i ( i ) ϕ i ( i ). Therefore, the data provider i will still be the winner, and the seletion rule is monotone. 2) Payment Calulation: By Theorem 3, the monotone seletion rule implies a unique payment alulation rule. We note that in the data prourement aution, the seletion rule x is a pieewise onstant monotone funtion, meaning that x i (b i, b i ) slump from 1 to 0 at some threshold point. In this ase, the payment alulated by Myerson s formula (9) is equal to ritial bid, whih is defined as follows. Definition 5 (Critial Bid): The ritial bid r(i) for data provider i M y is a threshold suh that if i bids lower than r(i), she wins; otherwise she loses. The ritial bid of the winner i an be alulated by the following steps. If the data provider i still wants to be the ) ;

496 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO.

Sine the virtual ost funtion is a stritly inreasing funtion, there must exist a largest bid that satisfies the above winning ondition.

arg min i =i ϕ i (b i ). Finally, the payment of the winner i is set as her ritial bid r(i ), and the payments of the losers are zero (Lines 8 to 9).

Proof: We first show that data provider i M y annot obtain a higher utility by bidding untruthfully. We disuss the analysis in the following two ases.

Suppose the data provider still wins the aution when she heats the bid, i.e., b i = i.

If the data provider loses the aution when she heats the bid, her utility is zero, whih is not better than the non-negative utility when bidding

If she still loses when bidding untruthfully, her utility annot be hanged.

We represent the virtual bids ϕ i (b i ) and ϕ i (b i ) when the data onsumer i bids truthfully and untruthfully, respetively.

As the virtual ost funtion ϕ i ( ) is stritly inreasing with respetive to the delared bid. We an get b i r(i) b i.

From the above analysis of two ases, we an see that thedataprovideri annot inreases her utility by bidding any other value than i, and thus bidding

We next prove that VENUS-PAY satisfies the property of individual rationality. On one hand, data provider s utility is zero if she loses in the aution.

bidder î, i.e., î = arg min i =i ϕ i (b i ).

On the other hand, i is the upper bound of the bid, i.e., ( b i i.

Map of Intel Berkeley Lab deployment, with the plaement of 54 sensors shown in dark hexagons.

Therefore, we an onlude that VENUS-PAY satisfies individual rationality.

Remark: We note that VENUS-PAY with i.i.d bidders and regular ost distributions F is simply the onventional seondprie aution.

The bidder with the lowest virtual ost is also the bidder with the lowest ost.

In the asymmetri ase, the ost distributions are non-idential, but still independent and regular.

11 496 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 winner in the aution, her virtual ost ϕ i (b i ) must be lower than the minimum virtual ost of the remaining data providers, i.e., ϕ i (b i ) min i =i ϕ i (b i ); otherwise she will lose the aution. Sine the virtual ost funtion is a stritly inreasing funtion, there must exist a largest bid that satisfies the above winning ondition. Considering that this largest bid may be larger than i, we set the ritial bid of the winner i as: ( r(i ) min ϕ 1 i ( ( ϕî bî )), i ), (13) where î = arg min i =i ϕ i (b i ). Finally, the payment of the winner i is set as her ritial bid r(i ), and the payments of the losers are zero (Lines 8 to 9). By Lemma 4 and Theorem 3, we have following theorem. Theorem 5: VENUS-PAY is a strategy-proof data prourement aution. Proof: We first show that data provider i M y annot obtain a higher utility by bidding untruthfully. We disuss the analysis in the following two ases. The data provider i wins the aution and gets an utility u i 0 when bidding truthfully, i.e., b i = i. Suppose the data provider still wins the aution when she heats the bid, i.e., b i = i. The utility of the data provider remains the same, beause the payment is unhanged. If the data provider loses the aution when she heats the bid, her utility is zero, whih is not better than the non-negative utility when bidding truthfully. The data provider i loses the aution when bidding truthfully, resulting in the utility to be zero. If she still loses when bidding untruthfully, her utility annot be hanged. We onsider the senario, in whih she heats the bid b i = i and wins the aution. We represent the virtual bids ϕ i (b i ) and ϕ i (b i ) when the data onsumer i bids truthfully and untruthfully, respetively. Aording to the winner seletion priniple, we have ϕ i (b i ) ϕ i(r(i)) ϕ i (b i ). As the virtual ost funtion ϕ i ( ) is stritly inreasing with respetive to the delared bid. We an get b i r(i) b i. Her utility now beomes non-positive: u i = r(i) i b i i = i i = 0. From the above analysis of two ases, we an see that thedataprovideri annot inreases her utility by bidding any other value than i, and thus bidding truthfully is a dominant strategy for eah data provider. Therefore, VENUS-PAY satisfies inentive ompatibility. We next prove that VENUS-PAY satisfies the property of individual rationality. On one hand, data provider s utility is zero if she loses in the aution. On the other hand, winning data provider gets utility: ( ) u i = r(i) i = min ϕi 1 (ϕî(bî)), i b i, where ϕî(bî) is the virtual bid of the ritial bidder î, i.e., î = arg min i =i ϕ i (b i ). On one hand, sine the data provider i is a winner and ϕ i ( ) is a stritly inreasing funtion, we have ϕ i (b i ) ϕî(bî) and then b i ϕi 1 (ϕî(bî)). On the other hand, i is the upper bound of the bid, i.e., ( b i i. Combining these two equalities, we have b i min ϕi 1 (ϕî(bî)), i ), implying that the utility of the data provider is always non-negative in Fig. 2. Map of Intel Berkeley Lab deployment, with the plaement of 54 sensors shown in dark hexagons. The green irles represent data aquisition points in the 12 regions. this senario. Therefore, we an onlude that VENUS-PAY satisfies individual rationality. Sine VENUS-PAY satisfies both inentive ompatibility and individual rationality, aording to Definition 3, VENUS-PAY is a strategy-proof mehanism. Remark: We note that VENUS-PAY with i.i.d bidders and regular ost distributions F is simply the onventional seondprie aution. In this senario, all the ost distributions F redue to a ommon distribution F, and thus the virtual ost funtions ϕ i ( i ) are the same for all bidders. The bidder with the lowest virtual ost is also the bidder with the lowest ost. Furthermore, aording to the expression of ritial bid (Equation (13)), the payment of the winner is exatly the seond lowest bid. In the asymmetri ase, the ost distributions are non-idential, but still independent and regular. VENUS-PAY does not generally resemble any autions used in pratie. In next setion, we will show that VENUS-PAY outperforms the seond prie aution in the asymmetri ase. V. EVALUATION RESULTS In this setion, we present the evaluation results of VENUS based on a real-world sensed data set. A. Sensed Data Set We first introdue the publi data set olleted by Intel Researh, Berkeley Lab [19]. As shown in Figure 2, researhers deployed 54 Mia2Dot sensors in the lab to measure multiple environmental phenomena, e.g., light, humidity, temperature and voltage readings. in a real time manner. We tailor the data set, and fous on the temperature measurements from all the 54 sensor nodes at 30 seonds intervals between February 28th, 2004 to April 5th, We disretize the olleted data into 5 bins of 3 degrees Celsius eah. We artifiially partition the lab into 12 non-overlapped regions, and virtually deploy one data aquisition point in eah of region, to represent the average of the readings measured by the sensors loated in the orresponding region. We an use the olleted samples to build a joint distribution over the 12 data aquisition points. For some seleted random variables X O, we an alulate its probability mass funtion p(x O ) by projeting the joint distribution p(x Y ) over X O. With Equations (1) and (2), we an alulate the onditional distribution and the orresponding onditional entropy for any seleted random variables.

12 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 497 B. Evaluation Setup We set the number of versions as K = 8, and set the orresponding quality vetor as Q = (0.25, 0.38, 0.53, 0.63, 0.70, 0.78, 0.89, 0.98). We fix the number of data onsumers as N = 500 through the evaluation 9. In order to examine the performane of VENUS under different data onsumers purhasing behavior models, we hoose two typial version preferene distributions and two ommon valuation distributions. Speifially, we adopt the Poisson distributions with two different parameters (parameter λ an be either 3 or 8) to be the version preferene distributions. We set two types of valuation distributions as follows. Uniform Distribution: The data onsumers valuations on the kth version are uniformly distributed over a range [2k, 4k]. Gaussian Distribution: The data onsumers valuations over the kth version are drawn from a Gaussian distribution with mean 3k and variane 3, and with a lower bound 0. Upon these distributions, we an obtain four different behavior models, and use Poisson-λ, Uniform (or Gaussian) to denote the Poisson distribution with parameter λ and the uniform (or Gaussian) valuation distribution. While the uniform distribution desribes that data onsumers have diverse valuations over the same set of data, the Gaussian distribution an apture the senario that data onsumers have similar valuations loated in a entralized range. We note that the parameter setting of the valuation distributions guarantees that the valuation over the high version is always larger than that of the lower version. We evaluate the performane of VENUS-PAY for one randomly seleted data aquisition point in a single time slot, i.e., T = 1. The number of data providers for this data aquisition point inreases from 6 to 10 sequentially. We set the ost distribution of the data providers as exponential distributions. Speifially, the ost of data provider i is drawn from the exponential distribution: { α i e α i x, if x 0, f i (x) = 0, otherwise, where the support of the distribution is (0, ). Wefurther onsider idential ost distributions and non-idential ost distributions senarios in VENUS-PAY. In the idential ase, we set the parameters α to be for all data providers, and in non-idential ase, we set α i = e i /10 5 for the data provider i. All the results of performane are averaged over 1000 runs. 1) Performane of VENUS-PRO: We implement VENUS- PRO, and ompare its performane with the optimal algorithm and random algorithm. We obtain the optimal profit, denoted by OPT, using the brute-fore searh method for the problem of profit maximization. The OPT result is served as the referene of upper bound of profit. In Random algorithm, we first randomly selet a version as the result, and then use a set of random data aquisition points to satisfy the quality 9 It is worth emphasizing that all parameters an be different from the ones used here. Considering that the evaluation results of using different parameters are idential, we only show the results for these parameters in this paper. Fig. 3. Revenue urves for the first three versions when the version preferene distribution is Poisson distribution with parameter 3. requirement of the seleted version. The existing works about data aquisition for mobile rowdsensing foused on other optimization objetives, suh as ost minimization [32] and data quality [23], and ignored the revenue extrated from data trading in the market. Thus, we do not ompare VENUS-PRO with the existing data aquisition mehanisms. With the knowledge of data onsumers purhasing behavior models, we an plot the revenue urve r k in Equation (3) with respetive to prie, for the first three versions. We set the version preferene distribution as the Poisson distribution with parameter 3. We further examine the evaluation result of uniform valuation distribution and Gaussian valuation distribution in Figure 3(a) and Figure 3(b), respetively. We obtain the optimal expeted revenue of eah version by alulating the optimal prie point of its orresponding revenue urve. After that, we an plot the optimal revenue for all of the versions in Figure 4(a). From Figure 4(a), we an see that the data broker obtains a large revenue by providing high version for the information ommodity. This is beause the high version an satisfy the data onsumers with high quality requirements, extrating additional revenue from these data onsumers. From Figure 4(a), we an also see that the revenue funtion with respetive to version has different trends and properties, under different purhasing behavior models. This result demonstrates that the property of revenue funtion is indeed omplex, and is hard to analyze in ompliated data markets in terms of diverse purhasing behavior models, making diretly solving the problem of profit maximization infeasible. In order to alulate the data aquisition expenditure, we have to know the minimum expeted payment at eah data aquisition point. In this set of experiments, we assume that data providers have the idential ost distributions, and the number of data providers at eah data aquisition point is randomly hosen from [6] and [17]. With this information, we an alulate the minimum expeted payment for eah data aquisition point by Equation (12). For a fixed version, VENUS-PRO applies the greedy algorithm, i.e., Algorithm 1, to alulate an approximate data aquisition expenditure, and the result is shown in Figure 4(b). From Figure 4(b), we an see that the expenditure beomes large when high version is provided in the data market. The reason is that we need to selet more data aquisition points to assure the high quality requirement. We an also see that VENUS-PRO always outperforms the Random algorithm, and approahes to the optimum. Based on the obtained revenue (i.e., Figure 4(a)) and the aquisition expenditure (i.e., Figure 4(b)), we an plot the

13 498 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 Fig. 4. Performane of VENUS-PRO, OPT, and Random. satisfy the strategy-proofness. Although the first prie aution always ahieves the lowest disbursed payment, we an not apply it to the ontext of data prourement, beause it has not any guarantee on eonomi properties. VI. RELATED WORK In this setion, we briefly review related work. Fig. 5. Performane of VENUS-PAY, First-Prie Aution, and Seond-Prie Aution. profits under different data onsumers purhasing behavior models in Figure 4(). The result shows that VENUS-PRO is very lose to the optimum in all the four representative purhasing behavior models, demonstrating the effiieny of VENUS-PRO for the rowd-sensed data trading. We note that in the Poisson-3, Uniform model, the expeted profit of the random algorithm is negative, so the data broker would not sell the information ommodity. We omit the profit of the random algorithm in this ase. 2) Performane of VENUS-PAY: We now present the evaluation results of VENUS-PAY. We implement VENUS-PAY, and ompare its performane with two lassial autions: first-prie aution and seond-prie aution. The non-truthful first-prie aution always disburse less payment ompared to VENUS-PAY, beause VENUS-PAY overpays the winner to guarantee strategy-proofness. We ompare the results of first-prie aution and VENUS-PAY to illustrate the system performane degradation aused by the requirement of strategy-proofness. By varying the number of data providers, we ollet a set of performane results, as illustrated in Figure 5. In Figure 5(a), VENUS-PAY has the same performane as the seond-prie aution in the idential ost distribution senario. This oinides with our analysis that VENUS-PAY redues to the seond-prie aution when the osts of data providers follow the same distribution. Figure 5(b) shows the evaluation results in the non-idential senario. From Figure 5(b), we an see that VENUS-PAY outperforms the seond-prie aution, whih does not take advantage of the knowledge of ost distributions. This result demonstrates that exploiting the information of ost distributions an redue the expeted payment to some extent. In both idential and non-idential ases, the result of VENUS-PAY is lose to that of the first-prie aution, denoting that VENUS-PAY sarifies limited performane to A. Data Market Design In reent years, designing priing mehanisms for online data markets attrats inreasing interests, espeially from database researh ommunity [3], [30], [31], [35]. These previous works mainly foused on designing omputationally effiient and eonomi-robust data priing mehanisms, ahieving two important axioms, i.e., arbitrage-free and disount-free [31]. Koutris et al. [30] showed that the pries of a large lass of queries an be omputed using an ILP solver. Later Lin and Kifer [35] designed an arbitrage-free priing funtion for arbitrary query formats. However, these works did not onsider the problem of data aquisition in data marketplaes. While there are a number of priing mehanisms for different kinds of network servies [27], [36], [37], [49], they annot be diretly applied into data marketplaes due to the unique harateristis of rowd-sensed data in terms of ost struture and unertain feature. B. Mobile Crowdsensing Reently, mobile rowdsensing has emerged as a new paradigm to generate olletive knowledge about phenomena at interested regions. Data aquisition is a ritial omponent in mobile rowdsensing system, and various inentive mehanisms have been proposed to motivate mobile users to ontribute data [7], [11], [18], [23], [25], [32], [58]. Yang et al. [58] applied Stakelberg game and reverse aution theory to design inentive mehanisms for two basi data aquisition models. Chen et al. [6] onsidered the network effet in user reruitment in rowdsouing. Cheung et al. [7] designed an asynhronous and distributed algorithm to reruit mobile users for time sensitive tasks. Considering the loations of tasks and the movements of mobile users, He et al. [18] proposed two inentive mehanisms based on disountedreward TSP algorithm and bargaining theory. Inspired by opportunisti networks, Karaliopoulos et al. [25] examined a pratial rowdsensing senario, in whih mobile users an play the roles of both data olletors and data transmitter.

14 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 499 Karaliopoulos et al. [24] studied the payment distribution problem in light of learning the user profiles. Our data aquisition model for the problem of payment minimization is similar to Koutsopoulos [32], in whih he designed an optimal reverse aution, ahieving Bayesian Nash equilibrium. In this paper, we proposed an optimal data prourement aution with the guarantee of strategy-proofness, whih is a stronger solution onept than Bayesian Nash equilibrium. Furthermore, we built a data trading model to apture the eonomi value of data in the market. Thus, our ultima goal is to extrat maximum profit from data trading, whih is different from the objetive of minimizing the expeted payment in [32]. C. Aution Mehanism Design Myerson [39] initially studied the optimal single-item forward aution, and proved that the prevalent reserve-prie-based autions an ahieve maximum expeted revenue. In ontrast to the forward aution, few of works studied the prourement aution design. Prourement autions, introdued to omputer siene already in [40], were at first studied to minimize soial welfare [12], [40], whih is different from our objetive of payment minimization. Reently, researhers have studied the problem of payment optimization under different definitions of frugality ratio, whih measures the amount by whih an aution overpays [26], [28]. By extending Myerson s seminal work, we designed the first strategy-proof and optimal prourement aution, and applied it into a new ontext of Internet eonomi system, i.e., rwod-sensed data markets. VII. CONCLUSION In this paper, by jointly onsidering the problems of profit maximization and payment minimization, we have proposed the first framework of profit-driven data aquisition, namely VENUS, in the rowd-sensed data marketplae. Given the expeted payment for eah data aquisition point, we have proposed VENUS-PRO to ahieve a sub-optimal profit. In order to determine the minimum payment for eah data aquisition point, we have designed VENUS-PAY, whih is an optimal, strategy-proof data prourement aution in Bayesian setting. We have implemented VENUS, and evaluated its performane on a real-world data set. Our evaluation results have shown that VENUS-PRO approahes the optimal profit, and VENUS-PAY outperforms the anonial seond-prie aution in terms of payments. ACKNOWLEDGMENT The authors would like to thank Wenxin Li for helpful disussions in the prourement aution design in Setion III. The opinions, findings, onlusions, and reommendations expressed in this paper are those of the authors and do not neessarily reflet the views of the funding agenies or the government. REFERENCES [1] K. Amin, A. Rostamizadeh, and U. Syed, Learning pries for repeated autions with strategi buyers, in Pro. Adv. Neural Inf. Proess. Syst. (NIPS), South Lake Tahoe, CA, USA, De. 2013, pp [2] A. Arher, C. Papadimitriou, K. Talwar, and É. Tardos, An approximate truthful mehanism for ombinatorial autions with single parameter agents, in Pro. 14th Annu. ACM-SIAM Symp. Disrete Algorithms (SODA), Baltimore, MD, USA, Jan. 2003, pp [3] M. Balazinska, B. Howe, and D. Suiu, Data markets in the loud: An opportunity for the database ommunity, Pro. VLDB Endowment, vol. 4, no. 12, pp , [4] H. K. Bhargava and V. Choudhary, Researh note When is versioning optimal for information goods? Manage. Si., vol. 54, no. 5, pp , [5] J.-M. Bohli, C. Sorge, and D. Westhoff, Initial observations on eonomis, priing, and penetration of the Internet of Things market, SIGCOMM Comput. Commun. Rev., vol. 39, no. 2, pp , Apr [6] Y. Chen, B. Li, and Q. Zhang, Inentivizing rowdsouring systems with network effets, in Pro. 35th IEEE Int. Conf. Comput. Commun. (INFOCOM), San Franiso, CA, USA, Apr. 2016, pp [7] M. H. Cheung, R. Southwell, F. Hou, and J. Huang, Distributed timesensitive task seletion in mobile rowdsensing, in Pro. 16th ACM Symp. Mobile Ad Ho Netw. Comput. (MobiHo), Hangzhou, China, Jun. 2015, pp [8] D. Cohn, L. Atlas, and R. Ladner, Improving generalization with ative learning, Mah. Learn., vol. 15, no. 2, pp , May [9] T. M. Cover and J. A. Thomas, Elements of Information Theory. Hoboken, NJ, USA: Wiley, [10] A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong, Model-driven data aquisition in sensor networks, in Pro. 13th Int. Conf. Very Large Data Bases (VLDB), Toronto, ON, Canada, Aug. 2004, pp [11] L. Duan, T. Kubo, K. Sugiyama, J. Huang, T. Hasegawa, and J. Walrand, Inentive mehanisms for smartphone ollaboration in data aquisition and distributed omputing, in Pro. IEEE INFOCOM, Mar. 2012, pp [12] J. Feigenbaum, C. Papadimitriou, R. Sami, and S. Shenker, A BGPbased mehanism for lowest-ost routing, Distrib. Comput., vol. 18, no. 1, pp , Jul [13] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA, USA: MIT Press, [14] T. Fujito, Approximation algorithms for submodular set over with appliations, IEICE Trans. Inf. Syst., vol. 83, no. 3, pp , Mar [15] R. Gao et al., Jigsaw: Indoor floor plan reonstrution via mobile rowdsensing, in Pro. 20th Int. Conf. Mobile Comput. Netw. (MobiCom), Maui, HI, USA, Sep. 2014, pp [16] L. Gargano and M. Hammar, A note on submodular set over on matroids, Disrete Math., vol. 309, no. 18, pp , Sep [17] A. V. Goldberg and J. D. Hartline, Collusion-resistant mehanisms for single-parameter agents, in Pro. 16th Annu. ACM-SIAM Symp. Disrete Algorithms (SODA), Vanouver, BC, Canada, Jan. 2005, pp [18] S. He, D.-H. Shin, J. Zhang, and J. Chen, Toward optimal alloation of loation dependent tasks in rowdsensing, in Pro. 33rd IEEE Int. Conf. Comput. Commun. (INFOCOM), Toronto, ON, Canada, Apr. 2014, pp [19] Intel Researh Berkeley Sensor Network Data. (2004). [Online]. Available: [20] I. Simonson and A. Tversky, Choie in ontext: Tradeoff ontrast and extremeness aversion, J. Marketing Res., vol. 29, no. 3, pp , [21] R. K. Iyer and J. Bilmes, Submodular optimization with submodular over and submodular knapsak onstraints, in Pro. Adv. Neural Inf. Proess. Syst. (NIPS), South Lake Tahoe, CA, USA, De. 2013, pp [22] R. Iyer, S. Jegelka, and J. A. Bilmes, Fast semidifferentialbased submodular funtion optimization, in Pro. 30th Int. Conf. Mah. Learn. (ICML), Atlanta, GA, USA, Jun. 2013, pp [23] H. Jin, L. Su, D. Chen, K. Nahrstedt, and J. Xu, Quality of information aware inentive mehanisms for mobile rowd sensing systems, in Pro. 16th ACM Symp. Mobile Ad Ho Netw. Comput. (MobiHo), Hangzhou, China, Jun. 2015, pp [24] M. Karaliopoulos, I. Koutsopoulos, and M. Titsias, First learn then earn: Optimizing mobile rowdsensing ampaigns through data-driven user profiling, in Pro. 17th ACM Symp. Mobile Ad Ho Netw. Comput. (MobiHo), Paderborn, Germany, Jul. 2016, pp

500 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 [25] M. Karaliopoulos, O. Telelis, and I.

Kempe, Beyond VCG: Frugality of truthful mehanisms, in Pro. 46th Annu. IEEE Symp. Found. Comput. Si. (FOCS), Ot. 2005, pp. 615 624. [27] G. S. Kasbekar and S.

Moore, Frugal and truthful autions for vertex overs, flows and uts, in Pro. 51th Annu. Symp. Foud. Comput. Si. (FOCS), Las Vegas, NV, USA, Ot. 2010, pp. 745 754. [29] G. Koop, D. J. Poirier, and J. L. Tobias, Bayesian Eonometri Methods.

15 500 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 35, NO. 2, FEBRUARY 2017 [25] M. Karaliopoulos, O. Telelis, and I. Koutsopoulos, User reruitment for mobile rowdsensing over opportunisti networks, in Pro. 34th IEEE Int. Conf. Comput. Commun. (INFOCOM), Hong Kong, Apr. 2015, pp [26] A. R. Karlin and D. Kempe, Beyond VCG: Frugality of truthful mehanisms, in Pro. 46th Annu. IEEE Symp. Found. Comput. Si. (FOCS), Ot. 2005, pp [27] G. S. Kasbekar and S. Sarkar, Spetrum priing games with spatial reuse in ognitive radio networks, IEEE J. Sel. Areas Commun., vol. 30, no. 1, pp , Jan [28] D. Kempe, M. Salek, and C. Moore, Frugal and truthful autions for vertex overs, flows and uts, in Pro. 51th Annu. Symp. Foud. Comput. Si. (FOCS), Las Vegas, NV, USA, Ot. 2010, pp [29] G. Koop, D. J. Poirier, and J. L. Tobias, Bayesian Eonometri Methods. Cambridge, U.K.: Cambridge Univ. Press, [30] P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, and D. Suiu, Toward pratial query priing with QueryMarket, in Pro. ACM SIGMOD Int. Conf. Manage. Data (SIGMOD), New York, NY, USA, Jun. 2013, pp [31] P. Koutris, P. Upadhyaya, M. Balazinska, B. Howe, and D. Suiu, Query-based data priing, J. ACM, vol. 62, no. 5, Nov. 2015, Art. no. 43. [32] I. Koutsopoulos, Optimal inentive-driven design of partiipatory sensing systems, in Pro. IEEE INFOCOM, Apr. 2013, pp [33] A. Krause and C. Guestrin, Near-optimal nonmyopi value of information in graphial models, in Pro. 21st Conf. Unertainty Artif. Intell. (UAI), Edinburgh, Sotland, Jul. 2005, pp [34] A. Krause and C. Guestrin, A note on the budgeted maximization on submodular funtions, Dept. Comput. Si., Carnegie Mellon Univ., Pittsburgh, PA, USA, Teh. Rep. CMU-CALD , [35] B.-R. Lin and D. Kifer, On arbitrage-free priing for general data queries, Pro. VLDB Endowment, vol. 7, no. 9, pp , May [36] R. T. B. Ma, D. M. Chiu, J. C. S. Lui, V. Misra, and D. Rubenstein, Internet eonomis: The use of shapley value for ISP settlement, IEEE/ACM Trans. Netw., vol. 18, no. 3, pp , Jun [37] P. Marbah and R. Berry, Downlink resoure alloation and priing for wireless networks, in Pro. 21st Annu. IEEE Conf. Comput. Commun. (INFOCOM), New York, NY, USA, Jun. 2002, pp [38] A. Mas-Colell, M. D. Whinston, and J. R. Green, Miroeon. Theory. London, U.K.: Oxford Univ. Press, [39] R. B. Myerson, Optimal aution design, Math. Oper. Res., vol. 6, no. 1, pp , [40] N. Nisan and A. Ronen, Algorithmi mehanism design (extended abstrat), in Pro. 31st Annu. Symp. Theory Comput. (STOC), Atlanta, GA, USA, 1999, pp [41] NoiseMap: A Researh Projet at Tehnishe Universität Darmstadt. [Online]. Available: researh/smarturban-networks/noisemap/ [42] NoiseTube: A Researh Projet at the Sony Computer Siene Laboratory in Paris. (2008). [Online]. Available: [43] A. Odlyzko, Paris metro priing for the Internet, in Pro. ACM Symp. Eletron. Commere (EC), Denver, CO, USA, Ot. 1999, pp [44] C. Papadimitriou, M. Shapira, and Y. Singer, On the hardness of being truthful, in Pro. 49th Annu. Symp. Foud. Comput. Si. (FOCS), Philadelphia, PA, USA, Ot. 2008, pp [45] Quandl. (2011). [Online]. Available: [46] A. Ronen and A. Saberi, On the hardness of optimal autions, in Pro. 43rd Annu. Symp. Foud. Comput. Si. (FOCS), Vanouver, BC, Canada, Ot. 2002, pp [47] C. Shapiro and H. R. Varian, Versioning: The smart way to sell information, Harvard Bus. Rev., vol. 107, no. 6, pp , [48] G. E. Smith and T. T. Nagle, Frames of referene and buyers pereption of prie and value, California Manage. Rev., vol. 38, no. 1, pp , [49] J. Tadrous, A. Eryilmaz, and H. El Gamal, Priing for demand shaping and proative download in smart data networks, in Pro. IEEE INFOCOM, Apr. 2013, pp [50] Terbin. (2013). [Online]. available: [51] Thingful. (2014). [Online]. Available: [52] Thingspeak. (2010). [Online]. Available: [53] V. Valanius, C. Lumezanu, N. Feamster, R. Johari, and V. V. Vazirani, How many tiers?: Priing in the Internet transit market, in Pro. ACM SIGCOMM Conf. Appl., Tehnol., Arhit., Protools Comput. Commun. (SIGCOMM), Toronto, ON, Canada, Aug. 2011, pp [54] H. R. Varian, Versioning information goods, Univ. California, Berkeley, Berkeley, CA, USA, Teh. Rep., [55] P.-J. Wan, D.-Z. Du, P. Pardalos, and W. Wu, Greedy approximations for minimum submodular over with submodular ost, Comput. Optim. Appl., vol. 45, no. 2, pp , Mar [56] Windows Azure Data Marketplae. [Online]. Avbailable: datamarket:azure:om/browse/data. [57] L. A. Wolsey, An analysis of the greedy algorithm for the submodular set overing problem, Combinatoria, vol. 2, no. 4, pp , De [58] D. Yang, G. Xue, X. Fang, and J. Tang, Crowdsouring to smartphones: Inentive mehanism design for mobile phone sensing, in Pro. 18th Int. Conf. Mobile Comput. Netw. (Mobiom), Aug. 2012, pp Zhenzhe Zheng (S 16) is urrently pursuing the Ph.D. degree with the Department of Computer Siene and Engineering, Shanghai Jiao Tong University, China. His researh interests inlude algorithmi game theory, resoure management in wireless networking, and data enter. He is a student member of ACM and CCF. Yanqing Peng reeived the B.Eng. degree in omputer siene and engineering from Shanghai Jiao Tong University, in He is urrently pursuing the Ph.D. degree with the Shool of Computing, University of Utah, USA. His researh interests inlude wireless networking, data enter networking, algorithmi game theory, and large-sale data management. Fan Wu (M 14) reeived the B.S. degree in omputer siene from Nanjing University, in 2004, and the Ph.D. degree in omputer siene and engineering from The State University of New York at Buffalo, in He is urrently an Assoiate Professor with the Department of Computer Siene and Engineering, Shanghai Jiao Tong University. He has visited the University of Illinois at Urbana Champaign, as a Post-Dotoral Researh Assoiate. He has published over 100 peer-reviewed papers in tehnial journals and onferene proeedings. His researh interests inlude wireless networking and mobile omputing, algorithmi game theory and its appliations, and privay preservation. He is a reipient of the first-lass prize for Natural Siene Award of China Ministry of Eduation, the NSFC Exellent Young Sholars Program, the ACM China Rising Star Award, the CCF-Tenent Rhinoeros Bird Outstanding Award, the CCF-Intel Young Faulty Researher Program Award, and the Pujiang Sholar. He has served as the Chair of CCF YOCSEF Shanghai, on the Editorial Board of Elsevier Computer Communiations, and as the member of the Tehnial Program Committees of over 60 aademi onferenes.

ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 501 Shaojie Tang (M 15) reeived the Ph.D. degree in omputer siene from the Illinois Institute of Tehnology, in 2012.

His researh interest inludes soial networks, mobile ommere, game theory, e-business, and optimization. He reeived the Best Paper Awards in ACM MobiHo 2014 and the IEEE MASS 2013.

16 ZHENG et al.: TRADING DATA IN THE CROWD: PROFIT-DRIVEN DATA ACQUISITION FOR MOBILE CROWDSENSING 501 Shaojie Tang (M 15) reeived the Ph.D. degree in omputer siene from the Illinois Institute of Tehnology, in He is urrently an Assistant Professor with the Naveen Jindal Shool of Management, The University of Texas at Dallas. His researh interest inludes soial networks, mobile ommere, game theory, e-business, and optimization. He reeived the Best Paper Awards in ACM MobiHo 2014 and the IEEE MASS He also reeived the ACM SIGMobile Servie Award in He served in various positions (as hairs and TPC members) at numerous onferenes, inluding the ACM MobiHo and the IEEE ICNP. He is an Editor of the Elsevier Information Proessing in the Agriulture and the International Journal of Distributed Sensor Networks. Guihai Chen (SM 16) reeived the B.S. degree from Nanjing University, in 1984, the M.E. degree from Southeast University, in 1987, and the Ph.D. degree from The University of Hong Kong, in He has been invited as a Visiting Professor to many universities, inluding Kyushu Institute of Tehnology, Japan, in 1998, The University of Queensland, Australia, in 2000, and Wayne State University, USA, from 2001 to He is a Distinguished Professor with Shanghai Jiaotong University, China. He has published over 200 peer-reviewed papers, and over 120 of them are in well-arhived international journals, suh as the IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, the Journal of Parallel and Distributed Computing, Wireless Network, The Computer Journal, the International Journal of Foundations of Computer Siene, and Performane Evaluation. His researh interests inlude sensor network, peer-to-peer omputing, high-performane omputer arhiteture, and ombinatoris. He is also in well-known onferene proeedings, suh as HPCA, MOBIHOC, INFOCOM, ICNP, ICPP, IPDPS, and ICDCS.