I am shocked – shocked – to read in Nature that the NIH peer review “system fails to account for individual bias, and places undue weight on panel members who have not even read the proposals.”
Valen Johnson, a biostatistician at the University of Texas M.D. Anderson Cancer, proposes use of an innovative statistical model to analyze R01 peer-review ratings in the Proc Natl Acad Sci this week. Using data from CSR reviews of ~19,000 proposals (involving ~14,000 reviewers or 2.8 reviewers per proposal) from 2005, Johnson looked for evaluation trends. Johnson found that “variability inherent to rater scores, and differences in the criteria used by individual raters to assign scores to proposals, have an enormous impact on funding decisions.” As summarized by Nature, Johnson also “found that the top grants were largely unaffected by reader bias, but that such bias did impact grants closer to the funding cut-off line.”
He notes that his model “accounts for differences in reviewer scoring criteria, provides a model for the sequential rating of items by various subsets of reviewers, and quantifies uncertainty associated with final proposal ratings” (his Bayesian hierarchical statistical model is available as supporting material).
In discussing his results, he suggests validation studies that CSR could conduct to examine the “discussion effect”, which could in turn be used to “assess the tradeoff between the cost of conducting SRG meetings and the cost of collecting additional, independent ratings of applications.” He proposes an alternative approach to making awards that would change the pool of funded proposals by 25-35% by taking into consideration the cost of proposals that fall close to the payline and rewarding those who ask for less than the maximum amount they can justify.
Nature reports a little tiff between CSR Director Toni Scarpa and Valen Johnson: “After Scarpa became director of the CSR in 2005, it asked Johnson to return the data. Johnson returned the original reviews, but was able to keep copies by placing a Freedom of Information Act request. Scarpa says that the CSR had heard Johnson present preliminary results and was not interested in pursuing the project further. Although the center is interested in revising its scoring procedures, “there was relatively little enthusiasm” about Johnson’s analysis, says Scarpa.”
Not surprisingly (and perhaps correctly), Scarpa “chafes” at the suggestion that funding “less expensive grants would allow the agency to fund more projects … and thus increase the likelihood that they have supported the best applications.” Indeed, “Some studies are inherently more expensive than others, and a proposal that includes a clinical trial should not be penalized for being more expensive than a proposal that does not, he says.”
Certainly a more rigorous statistical analysis of the peer review rating system was needed, and one hopes CSR will not discard these data or the model out of hand.