Update: Nature has published a news item entitled, Entire-paper plagiarism caught by software
Update: We already have an article retraction resulting from use of this database, and both Deja Vu and the underlying tool, eBlast, were mentioned (though apparently not used) in conjunction with a massive case of fraud (70 articles published over a 4-year period retracted).
In Nature this week, we have a Tale of Two Citations. A tale with which most of us are all too familiar. Best of times and worst of times and all that. A tale, um, retold in The Chronicle for Higher Education (though as a report about the Nature commentary).
In a nutshell: “Although duplicate publication and plagiarism are often discussed, it seems that discussion is not enough. Two important contributing factors are the level of confusion over acceptable publishing behaviour and the perception that there is a high likelihood of escaping detection. The lack of clear standards for what level of text and figure re-use is appropriate (for example in the introduction and methods) is a well known problem; but the belief that one can get away with re-use is probably the single most important factor.”
In addressing the latter issue, the authors, Mounir Errami and Harold Garner (both from University of Texas Southwestern Medical Center), used the search engine eTBLAST to look for similar language among papers and found 70,000 hits in Medline. These and duplicate citations from other scientific literature databases are deposited in Deja Vu. The authors have manually read through 2,600 abstracts from these hits and would like help from the scientific community in sorting through this repository. Dr. Garner has received a Research on Research Integrity R01 award to support this work.
In the Nature article, the authors suggest that if journal editors “use more frequently the new computational tools to detect incidents of duplicate publication — and advertise that they will do so — much of the problem is likely to take care of itself.” Perhaps, especially as manuscripts were returned and word spread along the grapevine.
Unfortunately, one possible source of high duplicate hit rate could be the increasing tendency for e-pub ahead of print, which could account for identical (or near identical) abstracts and authors within months of each other in the same publication. However, this would not be a problem for journal editors considering a newly submitted manuscript. Perhaps the International Committee of Medical Journal Editors would consider addressing this possibility (use by journal editors of electronic tools to flag plagiarism, self or otherwise) in a future update of their guidelines. Perhaps the folks involved with the NIH Public Access Policy and PubMedCentral might consider taking a look-see at their own repository.