Electronic Data Collection in Accommodation Statistics

Size: px
Start display at page:

Download "Electronic Data Collection in Accommodation Statistics"

Transcription

1 Electronic Data Collection in Accommodation Statistics Juha-Pekka Konttinen 1, Maria Velasco Gimeno 2 1 Statistics Finland, juha-pekka.konttinen@stat.fi 2 INE Spain, mrvelasco@ine.es Abstract Tourism statistics in the European Union are currently covered by Directive 95/57/EC and include statistics on the capacity and occupancy (arrivals, nights spent) of tourist accommodation establishments. Even though it is indispensable to produce statistical information of high quality, the European Statistical System is increasingly being called upon a reduction of the administrative burden for European enterprises. Therefore, initiatives need to be taken to find a better balance between user needs and burden put on producers. One approach is to introduce more efficient data collection methods. Eurostat has been supporting this development and an ESSnet project involving eight Member States was launched in This paper is a joint effort between two participating partners, Statistics Finland and INE Spain, who have introduced a system of automated data collection from tourist accommodation establishments. The aim of this paper is to describe the national experiences in electronic data collection and especially in automated data collection. Moreover, the paper will discuss the current situation related to the ESSnet project and possible future developments. The outcomes of the methodological and technological developments that will be presented obviously are relevant not only for tourism statisticians but for all domains of official statistics that are based on sample surveys. Further, this joint paper illustrates the possibilities that cooperation within the ESS offers to create synergies that lead to a win-win situation for all partners concerns, the statistical authorities, the reporting units and last but not least the users who expect timely data of high quality. Keywords: Electronic data collection using XML, burden reduction, quality 1. Introduction Tourism statistics in the European Union are currently covered by Directive 95/57/EC and include statistics on the capacity and occupancy (arrivals, nights spent) of tourist accommodation establishments. To produce statistical information of high quality, it is especially in the specific context of the Schengen Agreement where border surveys are no longer a feasible tool - indispensable to collect the basic information from European businesses, in casu enterprises providing accommodation services. 1

2 However, the European Statistical System is increasingly being called upon a reduction of the administrative burden for European enterprises. Therefore, initiatives need to be taken to find a better balance between user needs and burden put on producers. The latter group consists in the first place of the reporting enterprises, but also includes the statistical authorities in charge of collecting and compiling the statistics. One approach to reduce the burden is to reduce the volume of the collected information; another approach is to introduce more efficient data collection methods. In the recent context of reengineering official statistics and making optimal use of technological developments, Eurostat has been supporting the development of more efficient ways of collecting data, also in the field of tourism statistics. The introduction of a widespread system of automated data collection from tourist accommodation establishments can lead to a significant reduction in the reporting burden for enterprises and to a significant reduction in processing and compilation burden for the Member States' statistical authorities. An ESSnet project involving eight Member States was launched in The aims of this project are i) reduce response burden, ii) improve timeliness and iii) enhance international comparability and quality of statistics collected from tourist accommodation establishments. The ultimate objective is to develop a system which can generate statistical information automatically from the management information system(s) used by tourist accommodation establishments. This paper is a joint effort between two pioneers in this area, coming from two different situations in terms of geographical location and tourism market: Finland and Spain. INE Spain has been carrying out accommodation statistics since 1960; Statistics Finland has been producing these data since In both countries, a sample survey is used, but given differences in the tourism economy in Spain and Finland, the sample includes (depending on the season) from to establishments in Spain and from to establishments in Finland. In the past, the monthly data was collected by means of paper questionnaire, sometimes supported by phone or fax contacts. A system of electronic data collection was introduced in 2005 in Finland and in 2008 in Spain. In a first stage, an internet questionnaire was used, but since data entry and transmission of data through the internet questionnaire was still time-consuming, the need for an automated data collection became clear. This system is based on XML-format files that are formed automatically from the hotels' information (or booking) management systems and send as an encrypted electronic transmission to the database of the respective statistical institutes' database where a set of logic validation tests are executed. After transmission, the data can still be checked and if necessary revised. In the Spanish case, the hotel receives as a good practice an automatic feedback report with information on rates and revenues of the establishment and a comparison to the competitors in the same sector and/or region. In Finnish case, the similar feedback is sent quarterly. The results of the automated data collection have been encouraging. The introduction of the automated data collection has led to a significant reduction in the reporting burden for accommodation establishments and to a notable reduction in processing and compilation burden for the statistical offices. 1 Participating countries include Spain, Belgium, Bulgaria, Finland, Latvia, Lithuania, Poland and Slovakia 2

3 Notwithstanding the success story described above, it needs to be mentioned that there is still room for improvement in the take-up of the new transmission method by the respondents. The automated data collection has not been spread as quickly as one would expect. The implementation of the system is highly dependent of the software houses which create their hotel management systems. The global software houses and/or hotel chains do not necessarily consider one individual country as an important market area and therefore they are not willing to implement the system easily. It would be a major advantage if the reporting system would be identical in different countries. In this case software vendors would need to build only one system which would work in all countries. Consequently, the idea of having an international standard for automated data collection has been tested in the Nordic countries during 2008 and 2009, resulting in a common XML file. The above mentioned ESSnet project launched by Eurostat has the ambition to agree on a common XML file structure and to define validation rules to ensure the quality and coherence of the data contained in the XML file. The aim of this paper is to describe the national experiences in electronic data collection and especially in automated data collection. Moreover, the paper will discuss the current situation related to the ESSnet project and possible future developments.. 2. Electronic and automated data collection In the automated data collection, statistical information is generated from the respondent s management system into a specified file. This paper examines the current situation in Statistics Finland and INE Spain in accommodation statistics but the architecture and idea could also be used in the area of other statistics. In Finland and in Spain, the format of the file is determined by the XML Schema but it could also be in another format. The main idea in the automated data collection is that the data in XML format (or in other) is sent directly as an encrypted electronic transmission from hotel management systems into NSI s database. The procedure is more or less automatic. The respondent just press the button in their hotel management system and the data is sent to NSI. After the reception of the file, the data is validated both logically and manually, if needed, before the data is transferred to the production database of NSI. 3

4 3. National experiences in Finland Up in the beginning of 2005, there were more or less two types of respondents: i) those who answered by faxing reports generated from their hotel management system to Statistics Finland and ii) those who filled in paper questionnaires. Either way, the data had to be recorded manually in Statistics Finland. In addition, the response burden in accommodation establishments was high because of the manual work. Therefore, Statistics Finland decided to develop new modes of responding that are less burdensome both to the data suppliers and NSI. In early 2005, a questionnaire that could be filled in on the Internet was introduced alongside the previously used answering modes. This step was part of the already ongoing pilot project for the development of an XML-based questionnaire formed with the application developed by Statistics Finland for the implementation of data collection via the Internet. Nevertheless, all of these modes of responding involve manual phases, either for the data suppliers or for Statistics Finland, or for both. Hence, a pilot project was launched on the automated data collection. Statistics Finland already had an existing collective mode of XML-based questionnaire resultant of the pilot study. In addition, this collection mode already included an application for mass dispatching of s as well as an application for transferring data from the collection database to the production database. At the same time, logicality of the data could be verified. The major challenge was finding a way in which XML files could be formed from the data suppliers information systems. Statistics Finland and representatives of software suppliers concluded that the shared objective was to make the data collection easier and quicker. It was agreed that Statistics Finland would draw up the required specifications and documents, such as a description of the XML file, according to which the automated data reporting of data could be implemented. Accordingly, the participating software suppliers agreed to add a new reporting facility to their own software. A system of automated data collection was introduced in autumn 2005 and with the help of the system, data suppliers can transmit the XML file to Statistics Finland direct from the management system of the accommodation establishment simply by pressing the button. In 2008 the Nordic countries (Finland, Sweden, Denmark, Norway and Iceland) launched a co-operation project which aimed to create a common Nordic automated data collection system for the accommodation statistics. The group created a common Nordic File which has a standard data delivery specification, file format and file description. The same file can be used in all Nordic countries. As a result, the electronic data collection comprises two alternative modes in Statistics Finland. One is the Internet-based questionnaire and the other automated data collection. Figure 1 describes the process in Statistics Finland. 4

5 Figure 1. The architecture of the electronic data collection in Statistics Finland The process is quite similar between two alternatives. If the respondent delivers data by using automated data collection, the received data are transferred within a short delay to the Internet questionnaire. Thus, the respondent can quite easily view and use the sent data. Otherwise, the respondent must manually key data into Internet questionnaire. After this stage the process is similar between two alternatives. Data is transferred from the Internet questionnaire to the temporary database and then furthermore to the production database after data control and logical verifications. The team responsible for the accommodation statistics does these procedures. The respondents also receive an automated feedback report quarterly. In 2010, Statistics Finland received approximately 65 per cent of the data electronically (automated data collection and an Internet questionnaire) and 35 per cent by other modes of reporting (paper questionnaire, , fax etc.). In overall, about 15 per cent (130 accommodation establishments) of the data were received automatically. The development has been encouraging but unfortunately quite slow, at least in automated data collection. 5

6 Figure 2. The number of respondents by reporting method in Statistics Finland The introduction of both electronic and automated data collection have led to a notable reduction in processing and compilation burden for Statistics Finland. During the years working hours used for data collection, editing, reminders and feedback has reduced by 35 per cent. Figure 3. Working hours used for data collection, editing, reminders and feedback in Statistics Finland

7 To conclude the experiences in electronic and automated data collection have been encouraging. Once the accommodation establishment has implemented the system, the response burden is practically zero. Earlier and in other reporting modes the response burden per month is on average between 30 minutes and 2 hours. In addition, the compilation burden has reduced significantly. Moreover, Statistics Finland receives data earlier so we have more time to analyze and go through data. This has improved also quality of the statistics. It has to be noted that the implementation seems to be surprisingly slow. There are various reasons for this but the most important one seems to be that there is lot of different management systems and software or no software at all in accommodation establishments. This means that it takes time and money to update all software. In addition, once the automated data collection function is implemented, it seems to be rather challenging to get the updated version of the system to the customers (accommodation establishments) and introduce the new function. The reporter and the person who is responsible for the updates is not necessarily the same person. Thus, the reporters of the data might not be aware of the new function. It is also challenging to get bigger hotel chains to implement the system. Global software houses, which often supply the management systems to the hotel chains, consider that one country is a small market are and are not easily convinced to update their systems. In addition, small (often seasonal) establishments do not have appropriate software, resources and/or interest to invest money and effort to new system. It is encouraging that once the new system has been made, there have not been any major technical problems during the implementation and the delivering process. 4. National experiences in Spain 4.1 Preparatory works The XML file project for Hotel Occupancy Survey was launched in 2004 and developed in the following phases. a) Market research During , INE Spain carried out a market analysis of the hotel management software installed in hotel establishments. Both the market share of the software and the characteristics of those were pinned down. b) Discussion and design of the structure of the XML file In June 2006, a working group consisting of some regional statistical institutes in Spain 2 and National Statistics Institute (INE) was launched. The main objective of the working 2 In Spain, some regions carry out their own statistics for their geographical scope 7

8 group was to agree on the design of a XML file that could be used to collect the variables of the different hotel occupancy surveys carried out by the members of the group. This phase lasted over 15 months. After months of meetings, the aim was reached and the file had the following structure 3 : - Identification variables of the establishment (location variables, category, capacity, no. of rooms etc.). - In, out and overnight stays for each day of the month of reference, broken-down by residence of guests (country, if the guest resides out of Spain, or province/island, if the guest resides in Spain). - No. of occupied rooms each day of the reference months, distinguishing the type of room. - Information on ADR (Average Daily Rate) depending on type of client. - Data on employed personnel. Different associations and hotel enterprises were consulted to check that all information was recorded in the management software or that it was easy to obtain from other software used in the establishments. The problematic issue of asking variables as ADR (Average Daily Rate), RevPAR (Revenue Per Available Room) and employed personnel were discussed during the meetings of the working group. Data could be recorded in databases not included in management software and establishments could be reluctant of providing this information by the innovated way. It was then decided to divide the file in two parts. The first part includes everything except variables on prices and employment that make up the second part of the file. The first part is compulsory to all establishments that decide to send the data using XML file. They also have chance to send the second part by using XML, or by using the other methods. Nevertheless, all variables are obligatory to send. In order to ensure the coherence of the data some validation rules were agreed on. The respondent is not able to send the XML file if the file does not correspond to these rules. The process of sending the XML file is made easier by limiting validation rules. c) Diffusion of the new system Different ways were used to communicate with hotel establishments and software enterprises so that collecting data by XML files was going to be possible. - The agreed design of XML file, the rules and some classifications were published in the INE web site ( and in the web site of the other regional statistical institutes. - Meetings with main hotel chains in Spain that have their own management software, to explain the project and motivate them to use XML file. Hotel associations were also informed about the project in the meetings. 3 The detailed structure of the XML file can be downloaded in the following website: 8

9 - Contacts with software business to give them an incentive to install a new tool in the management software to extract the necessary data and develop the XML file for the Hotel Occupancy Survey. - Letters explaining the new way of transmitting the questionnaire of Hotel Occupancy Surveys were sent to the sample units. These meetings and contacts were organized during five months. d) Test phase A test period was started in order to give time to hotels and software houses to adapt the new system once it was announced. Moreover, they had a chance to test the system if the XML files and the validation rules were correctly designed and programmed. During at least one month hotels, hotel chains and software houses were sending and testing XML files so they could find out if there were some problems. e) Introduction of the new alternative to collect data Since May 2008, the hotel establishments in the Hotel Occupancy Survey sample have had an opportunity to send the questionnaire through XML file. Nevertheless, hotel establishments can choose the way of sending the questionnaire to the National Statistics Institute either by post, , Internet (connecting to a website where the questionnaire can be fill in manually), fax or now also by sending the XML-file telematically. The methodology of the survey was also adjusted. The collection data software and the estimation program had to be ready to receive the XML files and merge these data with the ones sent by respondents by traditional means. 4.2 Present Situation The first XML files were received with data of June Ten establishments decided to send the requested information using this new method. In July, there were 13 establishments and in August, 17. It was known that the process would be slow, but the number of respondents is increasing and there are many hotel establishments and hotel chains (with many establishments) interested in using the automated data collection. Nowadays, more than 80 establishments send XML files monthly to answer the inquiry. 4.3 Customized Reports Since January 2009 the establishments that send the data by this way, receive a customized report including information about the establishment and a comparison to their competitive group. The variables included are ADR, RevPAR and room occupancy rate, during the last 13 months (tables and graphs). This report has two aims: to provide useful information to respondents, so they can check that the data they provide produce more information and can improve the quality; and to show gratitude to them for sending the questionnaires by this new method. 9

10 5. Conclusions As discussed earlier, the results of the automated data collection have been encouraging. The introduction of the automated data collection has led to a significant reduction in the reporting burden for accommodation establishments and to a notable reduction in processing and compilation burden for the statistical offices. At Statistics Finland, the number of working hours spent on data collection, editing, reminders and feedback has dropped by 35% from hours in 2004 to hours in Even if the data still needs to be checked using various routines, automated data collection reduces the 'manual' work for both the respondent and the NSI. In addition, the overall quality and especially the timeliness have improved for instance the possibility of disseminating flash estimates just a few days after the end of the reference month. This system of automated data collection has shown that a reduction in burden can go hand in hand with an increase of the quality. From the perspective of software suppliers and further diffusion of automated data collection, it would be a major advantage if transmissions of the same format could be received in as many countries as possible. In this way, automated data collection could be introduced as rapidly as possible into as many software as possible. If each country defines its own data transmissions rules, the software suppliers have to produce a separate transmission procedure for every country. If the reporting system were identical in different countries, then it would minimize the effort of the software houses and fasten the implementation of the system in all countries. Moreover, it would imply that the data and the definitions would be fully comparable between these countries, at least as regards the automated data collection. The most important goal in the current ESSnet project is to create international standards for file description and data transfer in NSI s. In this way, only one standard file would include all necessary data in all countries. Software vendors would need to build one system that would work in every country. As a result, international implementation would probably be much faster compared to the current situation where countries have different definitions. References Cortina, Fernando, Mayo, Rafaela & María Velasco (2008) Experience of National Statistic Instititute Collecting Data Using XML-files, 9th International Forum on Tourism Statistics, Paris European Union (1995) Council Directive 95/97/EC of 23 November 1995 on the collection of statistical information in the field of tourism, Official Journal L 291, 06/12/1995, Vertanen, Ville (2008) Automated Data Collection on Tourism Statistics, 9th International Forum on Tourism Statistics, Paris 10