This webpage presents the L3iDocCopies dataset presented in the following publication:
- S. Eskenazi, P. Gomez-Krämer, and J.-M. Ogier, “Evaluation of the stability of four document segmentation algorithms,” in Proc. of International Workshop on Document Analysis Systems (DAS), pages 1–6. 2016.
Sample 1 |
Sample 2 |
Sample 3 |
Sample 4 | |
---|---|---|---|---|
This dataset contains 18 color copies of 55 documents from the dataset of he PRImA dataset used in 2009 [1]. It contains real print and scan noise and can be used for document segmentation analysis and for print and scan noise analysis.
This dataset was created using the following hardware: Printer: Lexmark x543 PS Printer: Canon iR Advance C9060 Pro Printer: Konica Minolta C5501 Scanner: Konica Minolta bizhub 223 at 300dpi and 600dpi twice Scanner: Konica Minolta bizhub C364e at 600dpi Scanner: Fujitsu fi 6800 at 300dpi Scanner: Lexmark x543 PS at 300dpi
Each document was printed on all printers (3 prints). All the prints were scanned with all scanners resulting in 3*6=18 copies. There are 55 documents which makes a total of 55*18=990 images.
[1] A. Antonacopoulos, D. Bridson, C. Papadopoulos, and S. Pletschacher. A realistic dataset for performance evaluation of document layout analysis. In Proc. of 10th International Conference on Document Analysis and Recognition (ICDAR), pages 296–300. IEEE, 2009.
The dataset has a size of 4.2GB and is hosted on an FTP server of the University of La Rochelle. Please contact "mluqma01(at)univ-lr(dot)fr" to have access to it.