Finding duplicate images in biology papers

Authors:

Markus Zlabinger

Allan Hanbury

Type:

Poster presentation with proceedings

Proceedings:

32nd ACM SIGAPP Symposium On Applied Computing

Publisher:

SAC '17 Proceedings of the Symposium on Applied Computing

Pages:

957 - 959

ISBN:

Year:

2017

Abstract:

Duplicated images in biology papers are a possible indicator for plagiarism or data fabrication. A manual detection of such duplicates can be time consuming or even infeasible for huge image collections. In this paper, a semi-automatic duplicate detection approach is proposed. The approach can be used for the detection of duplicates that cover only a fraction of the full image, are transformed (e.g. rotation), occur between images or within single images (i.e. single-image-duplicates). In the proposed approach, single-image-duplicates are detected between sub-images (i.e. sub-figures) based on a connected component approach and duplicates between images are detected via the min-hashing technique. The approach was evaluated on 1.7 million images extracted from biology papers. By application of various filtering methods to remove false positive detections, only a small amount of manual effort was necessary to find 3041 potentially serious duplicates in so far non-retracted papers.

TU Focus:

Information and Communication Technology

Reference:

M. Zlabinger, A. Hanbury:
"Finding duplicate images in biology papers";
Poster: Symposium on Applied Computing (SAC), Marokko; 04.04.2017 - 06.04.2017; in: "32nd ACM SIGAPP Symposium On Applied Computing", SAC '17 Proceedings of the Symposium on Applied Computing, (2017), S. 957 - 959.

Zusätzliche Informationen

PDF Link:

http://publik.tuwien.ac.at/files/publik_264250.pdf

Last changed:

12.12.2017 19:46:55

TU Id:

264250

Accepted:

Accepted

Invited:

Department Focus:

Business Informatics

Info Link:

https://publik.tuwien.ac.at/showentry.php?ID=264250&lang=1

Abstract German:

Author List:

M. Zlabinger, A. Hanbury

Main menu

Finding duplicate images in biology papers

Who's online

Contact

Offenlegung gemäß § 25 Mediengesetz:

Datenschutzerklärung

In case of problems

Finding duplicate images in biology papers

Search form

Who's online

Contact

Offenlegung gemäß § 25 Mediengesetz:

Datenschutzerklärung

In case of problems