UKSG - USAGE STATISTICS. A Publisher s Perspective Name: Michael Zeyfert Position: Editorial Data Analyst Date: 11 th September 2012

Size: px
Start display at page:

Download "UKSG - USAGE STATISTICS. A Publisher s Perspective Name: Michael Zeyfert Position: Editorial Data Analyst Date: 11 th September 2012"

Transcription

1 UKSG - USAGE STATISTICS A Publisher s Perspective Name: Michael Zeyfert Position: Editorial Data Analyst Date: 11 th September 2012

2 CONTENTS WHY MEASURE USAGE? HOW AND WHAT DO WE MEASURE? CAVEATS AND PITFALLS CASE STUDIES THE FUTURE QUESTIONS AND DISCUSSION

3 Section 1.0 WHY MEASURE USAGE? The drivers behind usage reporting UKSG Usage Statistics Training Seminary Michael Zeyfert 11 th September 2012

4 Who s Asking? Librarians The primary driver - renewals Editors/Societies Journal-level decisions Publishers Sales/Marketing/Editorial etc. Authors Tenure, peer-recognition Readers Discoverability

5 Section 2.0 HOW AND WHAT DO WE MEASURE? Logs, scripts and other sources

6 Two Ways to Capture Usage Data Script-based script (program) runs in browser user views web page Log file based server records pages served sends data to third party data in a database server logs are processed

7 Starting Point - A Lot of Logs 3 log files per journal per day >40 GB to process per month >300M page views per year

8 We Process the Log Files to Give: Log File Item Date/Time Domain Equivalent To Journal we add as standard: Standard Additional Data Customer Account Details Customer Name URL Web Page Viewed User Location/Institution Referring Site Linking Source Article Metadata (type, author, title etc.) Referring Parameter Search Engine Query User IP User Client Authentication Credentials Session Information User Location User Browser and Operating System Customer Account and Access Method What Else User Viewed we can also add: Custom Additional Data Subscription Type Access Control Status (e.g. Open Access) Pathway Information (site navigation) Third Party Data (e.g. citations, MeSH)

9 Detailed Data At the most granular level Each web page viewed Associated article metadata Who viewed it (or at least which institution) How they found it What else they viewed Detailed data is often interesting and sometimes useful.

10 Detailed Data but often overwhelming in quantity! So how do we usefully summarise usage?

11 COUNTER Reports COUNTER provide a code of practice for reporting usage statistics for books, journals and other online works Definitions (full-text etc.) Report specifications Annual audit required for compliance

12 COUNTER Reports Journal Report 1 (JR1) - Full-text usage Journal Report 1(R3) Sample Institution Date run: 13/04/2011 Number of Successful Fulltext Article Requests by Month and Journal Publisher Platform Print ISSN Online ISSN Jan-2010 <snip> Dec-2010 YTD Total YTD HTML YTD PDF Total for all Journals Oxford Journals Highwire Acta Biochimica et Biophysica Sinica Oxford Journals Highwire Adaptation Oxford Journals Highwire African Affairs Oxford Journals Highwire <snip> Writing Systems Research Oxford Journals Highwire X Yearbook of European Law Oxford Journals Highwire Yearbook of International Environmental Law Oxford Journals Highwire

13 COUNTER Reports COUNTER release 4 Due for implementation before the end of 2013 New reports for journals Gold Open Access split out Usage by Year of Publication Access Denials A Publisher s Perspective COUNTER reports are important because they are important to customers COUNTER reports are basic I would expect to see more article-level detail mandated

14 Section 3.0 CAVEATS AND PITFALLS Beware of the graphs!

15 Beware! Usage is cheap. Really cheap. Not necessarily human Quick and easy to generate Susceptible to internet phenomena a network of users / going viral sexual or general interest content Not all usage is equal A poor proxy for quality (c.f. citations) false positives field-dependent Google policy dictates usage trends

16 Beware of Gross Numbers! 500,000 requests from distributed IPs BotNet?

17 Beware of Google! Homepage usage rise (probably blame Google)

18 Beware of Users! Users are not necessarily human We exclude well-behaved robots We exclude robots on our lists We exclude usage peaks from non-customers But that still leaves a lot of robotic traffic

19 Beware of Users! Users are not always looking for what they find False positives Popular search terms X-rated search terms Search Term Visits journal of heredity 7419 easypop 628 brachydactyly type D 607 bryophyllum 571 stages of meiosis 512 cream colored dog 473 pedigree symbols 469 SSR markers 437 turtle reproductive system 345 genepop 333 turtle mating 332 journal heredity 328 computer notes pdf 327 turtle reproduction 297 mapchart 288 gestation period for cows 279 journal of heredity impact factor 277 turtles mating 274 xxsex 269 turtle mating habits 256

20 Usage as a Quality Metric (Gross) Usage is a BAD proxy for quality As things stand: A high proportion of usage is false-positive Very field/subject dependent General public vs. academic specialists? May be dependent on access control If a Usage Factor is published Very easy to game if there is an incentive Competing demands on publishers High usage or accurate robot removal?

21 Section 4.0 CASE STUDIES Some examples for you

22 Example Usage Report for Editors

23 Example Usage Report for Publisher Does free content have higher usage? Dissemination versus profit?

24 Example Usage Report for Marketing

25 Section 5.0 THE FUTURE My personal predictions

26 The Future Usage will be used for journals pricing but pricing will not be solely based on usage Article-level usage data will be freely available Future COUNTER releases will give more detailed data to librarians A Usage Factor may be adopted by some Publishers Social media kudos will be more important than gross usage (see

27 The Future BUT There will be widely-publicised cases of usage abuse/gaming/farming Pre-fetching/web-acceleration will decrease the accuracy of reported statistics Usage (as it stands) will lose credibility as a quality metric

28 Section 6.0 QUESTIONS AND DISCUSSION Over to you

29 Thank you Thank you Michael Zeyfert