SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 Conform and be funded: Supplementary information Joshua M. Nicholson and John P. A. Ioannidis Methods Authors of extremely highly-cited papers To identify first and last authors of publications in the life/health sciences cited 1000 times or more within the years , we queried the Scopus ( database. The Scopus results also included information on the number of authors per publication, whether the publication was considered a review or an article, and the author s affiliations. We reassessed each review/article assignment, since errors may occur in the indexing in a few cases. The main analysis focuses on papers published in venues catalogued by Scopus under Life Sciences and Health Sciences (i.e. excluding Physical Sciences and Social Sciences and Humanities) (so as to align with the main mission of NIH) and to those with an affiliation from the USA. However, we also collected data for sensitivity analyses where all papers with ³1000 citations were included regardless of type of venue and regardless of location of affiliations. The main analysis considered papers regardless of their type, and sub-field within the biosciences. 27% of them were reviews, and another 5% were methods papers. For such extremely-cited work, one can argue that the influence and value is not decisively diminished or enhanced by the type of the paper or methods/design of the research. We nevertheless performed comparisons separating articles from reviews in a case-control study (see below). Moreover, there can be modest differences in the typical number of citations accrued by papers in different sub-fields within the life/health sciences, but these pertain to the average papers close to the middle of the distribution; it is unclear whether these differences should be taken into account for extremely highlycited papers that represent the far tail (<0.01%) of citation distributions. Finally, citation counts would depend on the year of publication, with older papers having an advantage. This is why we focused on the last 12 years ( ). While several of the most recent papers may also exceed 1000 citations in the future, we preferred to avoid extrapolating and focus on those papers that have already achieved this landmark so as to maximize the specificity of extreme impact for the eligible papers. Members of NIH study sections Rosters for NIH regular standing study sections were copied from the Center for Scientific Review website as of April 2012 ( and both permanent and temporary members were compiled into a master list of individuals in Excel. There were 182 study sections extracted, composed of 8,517 discrete individuals merged into one sheet. The study section date ranged from years Names were disambiguated by perusing also institutional websites and cross-checking curricula vitae. Of the 8,517 individuals, 102 resided in a country other than the USA and 8415 (n=98.8%) were located in the USA. A total of 75 NIH study section members (0.9% of the entire roster) were among the identified single, first, or last authors of an extremely highly-cited paper in any discipline (not just life/health sciences). All of them were located in the USA. Of those, 72 (0.8% of the entire roster) had published their extremely highly-cited papers in the life/health sciences (total n=84 papers, some authors had 2 eligible papers). These 72 scientists Supplementary information 6 December 2012 NATURE 1

2 represented only 6.1% of the 1,172 scientists who had published an extremely highlycited paper in the life/health sciences with an author affiliation from the USA as single, first, or last author in Evaluation of NIH funding status of authors of extremely highly-cited papers For analysis of NIH funding status, we stratified authors of extremely highly-cited papers according to whether they were members of NIH study sections or not. The former group is comprised of all 72 scientists described in the previous paragraph. For the latter group, we randomly selected 200 top cited papers from the Scopus list where either the first or last author had a USA affiliation at the time of publication. We determined authors affiliations as they were listed in Scopus. Where we could not determine an author s affiliation based on this method, we searched the respective publication and/or the author s website. Further, we excluded papers where either first or last author was a NIH study section member. NIH funding was then determined for each individual by searching the NIH Reporter ( for all active NIH grants. Current active NIH funding as principal investigators was estimated separately in the strata of NIH study section members and non-members of NIH study sections. The extremely highly-cited papers of the two groups were also compared in terms of year of publication, type of paper (article versus review), single authorship and median number of authors. We also compared case and control authors on the number of eligible highly-cited papers and the proportion with more than one highly-cited paper. The analysis of funding was also done with further stratification according to whether authors had one or more than one eligible highly-cited paper. Finally, to avoid potential lack of independence when both the first and last authors are selected from the same highlycited paper, we performed analyses limited to first (or single) authors, and separately analyses limited to last (or single) authors. Results were similar. Statistical analyses were performed in Prism. Similarity of grants To determine the similarity of an individual s grant to other currently funded NIH grants for roster members and non-members we first identified one grant per individual based on the following rules using NIH Reporter. If they had funding, we selected an individual grant to compare to other NIH grants. If they had multiple grants we selected R01 grants first, if they had multiple R01 s we selected the largest grant in terms of dollars. If they had no R01s we selected any other research grant denoted by R before the string of characters. If they had none of these grants we selected the largest grant they did have. We then compared the selected grants to 100 similar grants using NIH Reporter. We collected the match score number listed next to each grant in the list of 100 and averaged these to give a similarity score of the grant, and in turn individual, to currently funded NIH grants. Match scores were determined by the degree of overlap among the projects keywords (the set of project terms appearing on the Description tab of each Project Information page) ( Random sample of 100 roster members Citation analysis for the NIH study section members was performed on a subset of the roster by randomly selecting 100 members and individually analyzing their publication records using Scopus and cross-checking results with Harzing s Publish or Perish software in GoogleScholar to ensure that no influential publications were missed. When common names could not be fully disambiguated, we further verified authors 2 NATURE 6 December 2012 Supplementary information

3 publications by visiting their websites and examining their work and their lists of publications, whenever available. For the primary analysis, we focused on extracting information only on the most-cited paper of each scientist (as single/first/last author and as author in any position). We also used the Scopus analytical tools to calculate overallimpact metrics, including the total number of citations and the Hirsch h-index (J. E. Hirsch Proc. Natl Acad. Sci. USA 102, ; 2005). We caution that Scopus counts total citations starting after 1995 and only papers published after 1995 are considered in the calculation of the h-index. These overall-impact metrics are provided as indicators of the cumulative influence of the work of these scientists in the last 17 years. Data on the most-cited paper are more pertinent to the hypotheses and arguments raised in our paper, since they are more relevant in detecting extremely influential work, as opposed to overall-impact metrics that may be more pertinent to more everyday life, average situations, e.g. evaluation for promotion. Several scientists who have been awarded the highest recognitions, e.g. Nobel prizes, do not have very high h-indices, but they all have at least one scientific work that has been extremely influential. Supplementary information 6 December 2012 nature 3

4 Appendix 1: Citation impact of NIH roster members We assessed the citation impact of the 100 randomly chosen NIH roster members. Searches were based on Scopus, for consistency, and independent searches were also performed in GoogleScholar to ensure that no major influential publications were missed. Common names were disambiguated by carefully screening the web sites of the scientists to identify their work and full publication lists, whenever available. The presented results refer to Scopus citations as of May As shown in Figure 1, only 1 and 7 of the 100 scientists had published a paper with at least ³1000 citations as single/first/last and any-position author, respectively, at any time of their career (1 and 3, respectively, when limited to the period ). The median (interquartile range) number of citations of the most-cited paper of the NIH roster members was 136 (90-229) for single/first/last authorship and 211 ( ) for any authorship position. The median (interquartile range) of their total citations is 1824 ( ) and of their h-index is 20 (14-27). Only 2/100 have received over 10,000 citations (maximum 12,338). These numbers are suggestive of modestly good impact, but not extremely high impact. 4 NATURE 6 December 2012 Supplementary information

5 Percentage of study section members with at least that many citations First or last author Author in any position Number of citations of the most highly cited paper Figure 1: Proportion of scientists with different number of citations in their mostcited paper (as single/first/last author and as author in any authorship position) for 100 randomly selected NIH roster members. Eighty-three of these 100 scientists are principal investigators in NIH grants, but, as shown, only 1 has published as a first or last author an article with over 1000 citations. Authors of papers with over 1000 citations rarely participate in NIH study sections and most of them don t receive NIH funding. Supplementary information 6 December 2012 nature 5

6 Supplementary Information (Raw datasets available upon request) Data used for analysis is contained in different sheets in the supplementary excel workbook. The SUMMARY sheet provides an overall view of the results comparing NIH roster members with an extremely high-cited publication to non-roster members also with an extremely high-cited publication. The ROSTER sheet contains the names of all the 8,517 NIH regular standing study section members. The sheet NON-US ROSTER lists roster members with a country of origin other than the US. The sheet TOPALLSCI shows the highest cited publications across all scientific disciplines, while the sheet TOPLIFESCI shows the highest publications across the health/life sciences. Included in these sheets are the titles of the publications, the journal they were cited in and the type of publication (i.e. Article vs. Review). The D1- HIGHLYCITEDAUTHORSROSTERMEMBERS sheet (D1 stratum) shows NIH roster members names with an extremely high-cited publication, the extremely high-cited publication, their current NIH funding status, the number of authors on the extremely high-cited publication, and the publication type. The sheet D2- HIGHLYCITEDAYTHORSNONROSTERMEMBERS (D2 stratum, data on authors of highly-cited publications who are not NIH roster members) shows the 159 discrete papers randomly selected from the TOPLIFESCI list used for analysis and includes the number of authors per publication, the status of current NIH funding, the article title, the journal, the article type, and the country of origin. The sheet D1 (FIRST, LAST) shows D1 authors separated by authorship (first or last) along with funding status. Similarly, the sheet D2 (FIRST,LAST) shows D2 authors separated by authorship. The sheet ARTICLEVSREVIEW shows the proportion of articles and reviews across the D1 and D2 strata. The sheet #OFAUTHORS shows the number of authors per publication between the two strata, along with a summary of the statistical test used. The sheet, 100 ROSTER SAMPLE is a random selection of 100 NIH roster members from the ROSTER list, their citation metrics, their rank, their institution, and their NIH funding status. The SIMILARITY sheet shows the average match score per individual between the D1 and D2 stratum. THE MULTI PUBS sheet lists the authors with multiple publications in the D1 and D2 strata as well as their current NIH funding status. YEARS PUBLISHED shows the year of publication for both D1 and D2 publications. 6 NATURE 6 December 2012 Supplementary information