Duplication in Scientific Publications

Update: Nature has published a news item entitled, Entire-paper plagiarism caught by software

Update: We already have an article retraction resulting from use of this database, and both Deja Vu and the underlying tool, eBlast, were mentioned (though apparently not used) in conjunction with a massive case of fraud (70 articles published over a 4-year period retracted).

In Nature this week, we have a Tale of Two Citations. A tale with which most of us are all too familiar. Best of times and worst of times and all that. A tale, um, retold in The Chronicle for Higher Education (though as a report about the Nature commentary).

In a nutshell: “Although duplicate publication and plagiarism are often discussed, it seems that discussion is not enough. Two important contributing factors are the level of confusion over acceptable publishing behaviour and the perception that there is a high likelihood of escaping detection. The lack of clear standards for what level of text and figure re-use is appropriate (for example in the introduction and methods) is a well known problem; but the belief that one can get away with re-use is probably the single most important factor.”

In addressing the latter issue, the authors, Mounir Errami and Harold Garner (both from University of Texas Southwestern Medical Center), used the search engine eTBLAST to look for similar language among papers and found 70,000 hits in Medline. These and duplicate citations from other scientific literature databases are deposited in Deja Vu. The authors have manually read through 2,600 abstracts from these hits and would like help from the scientific community in sorting through this repository. Dr. Garner has received a Research on Research Integrity R01 award to support this work.

In the Nature article, the authors suggest that if journal editors “use more frequently the new computational tools to detect incidents of duplicate publication — and advertise that they will do so — much of the problem is likely to take care of itself.” Perhaps, especially as manuscripts were returned and word spread along the grapevine.

Unfortunately, one possible source of high duplicate hit rate could be the increasing tendency for e-pub ahead of print, which could account for identical (or near identical) abstracts and authors within months of each other in the same publication. However, this would not be a problem for journal editors considering a newly submitted manuscript. Perhaps the International Committee of Medical Journal Editors would consider addressing this possibility (use by journal editors of electronic tools to flag plagiarism, self or otherwise) in a future update of their guidelines. Perhaps the folks involved with the NIH Public Access Policy and PubMedCentral might consider taking a look-see at their own repository.



  1. CC said

    I tried to provoke Drugmonkey into taking this one on, but apparently he was too busy rolling around in his new ScienceBlogs wealth. I’m glad someone thought of interest!

    My take on the Errami and Garner article: utter garbage. Look at this. Or this. A lot of others look like poster abstracts of results published elsewhere as papers. Others are, as you say e-pub/print overlaps. And these entries are supposedly curated!

    Frankly, the more I look, the more I think it rises to the level of misconduct in its own right, publicly accusing people by name over nonsense like this.

    I’ll agree that the public presentation of duplicates in Deja Vu seems premature given the significant number of false positives. But there are true positives in there, and they do put a disclaimer up front about leaving it up to the visitor as to how the information is used. However, I’d rather the bulk of the introductory text be less zealous and accusatory given the lack of, um, peer review here, and the authors acknowledge the limitations of the data included. The tool clearly works and is needed but will only be practically useful at the level of a journal checking for prior publication or perhaps screening PubMedCentral, which won’t include meeting abstracts or e-pub ahead of print duplicates … but not all of PubMed or any other citation database in bulk. – writedit

  2. CC said

    However, I’d rather the bulk of the introductory text be less zealous and accusatory given the lack of, um, peer review here, and the authors acknowledge the limitations of the data included.

    One correction on my part: I’d thought that the website contained only curated entries, which is not the case, although the “false positive” rates they claim are far lower than what I’m perceiving. OK, let’s look at some double-curated “true positives”. This is a routine bit of salami publication, with resulting similarity between abstracts. Not exactly the proudest moment in the history of science, but hardly misconduct. This, on the other hand, seems like a completely legitimate overlap between two reviews on the same subject.

    Anyway, how is some grad student even supposed to know whether an abstract in an obscure, possibly defunct, foreign-language journal is for a poster or a paper? No, I’m going to have to vehemently differ with “the authors acknowledge the limitations of the data included”.

  3. […] Biomedical Research Ethics, Biomedical Writing/Editing, Research News First, an early casualty of Deja Vu as reported in Nature: “A review article written by a rheumatologist at Harvard Medical School […]

  4. […] Try telling these folks that plagiarism is a “victimless” crime. No help from Deja Vu in this case […]

  5. writedit said

    Today in Nature: Entire-paper plagiarism caught by software

    Garner estimates that among the 181 papers they have identified so far as duplicates, 85% of the text is similar on average, but one-quarter share close to 100%. For a full list of the most similar pairs of articles, click here. There are currently 22 ‘repeat offenders’ in the database. These are authors who have published at least two articles that do not share authors (and so are putative or known plagiarisms). On average these people have ‘authored’ four papers, ranging from two to ten, and spanning 12 countries.

    Garner has begun to systematically contact editors and authors of the duplicates he has identified to assess how other cases have been followed up, and is submitting the results for publication. Many journal editors seem reluctant to pursue cases of plagiarism, and half of the articles that editors are alerted to remain uncorrected, Garner says. Few journals have communicated their retraction decision to PubMed, the most widely used abstracts database.

    When confronted with their plagiarism, some researchers can be brazen. One offender, whose paper shared 99% of its text with an earlier report, wrote to Garner: “I seize the opportunity to congratulate [the authors of the original paper] for their previous and fundamental paper — in fact that article inspired our work.”

  6. writedit said

    A letter in Nature suggests the software could also be used to flag plagiarism in research proposals – a fear of many applicants:

    Nature 456, 30 (6 November 2008)
    Detectors could spot plagiarism in research proposals
    Victor Maojo, Miguel García-Remesal & Jose Crespo

    Your News story ‘Entire-paper plagiarism caught by software’ (Nature 455, 715; 2008) follows other reports of systems to detect plagiarism (see M. Errami and H. Garner Nature 451, 397–399; 2008, and S. L. Titus et al. Nature 453, 980–982; 2008). Having all been involved in proposal evaluation, we believe the studies indicate that a text-matching analysis of research proposals could reduce plagiarism in subsequent publications.

    For instance, when European Commission evaluators have met in the past to evaluate research proposals, they received printed copies which had to be returned before the panel members left, and had no computer access during deliberations. A plagiarism-detector using text-mining methods could be used instead of the current security measures. Such a system could, in principle, detect similarities to previous submissions and uncited sources using advanced document segmentation.

    Only official agencies have access to confidential proposals and the funds to experiment with automated plagiarism-detectors. It is important that they should investigate these approaches to reducing the possibility of scientific misconduct.

  7. writedit said

    Mauno Vihinen has written to Nature about a flaw in the Deja Vu database that flags 4 of his publications as unverified duplicates when in fact they are distinct scientific reports. His concern is that reputations could be tarnished by listings that have not been analyzed to confirm the potential for self-plagiarism. I would certainly recommend that everyone search their own names and clear up any mistaken listings.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: