This webpage presents the L3iDocCopies dataset presented in the following publication:

Sample 1

Sample 2

Sample 3

Sample 4

Exemple 1 Exemple 2 Exemple 3 Exemple 4

This dataset contains 18 color copies of 55 documents from the dataset of he PRImA dataset used in 2009 [1]. It contains real print and scan noise and can be used for document segmentation analysis and for print and scan noise analysis.

This dataset was created using the following hardware:
Printer: Lexmark x543 PS
Printer: Canon iR Advance C9060 Pro
Printer: Konica Minolta C5501
Scanner: Konica Minolta bizhub 223 at 300dpi and 600dpi twice
Scanner: Konica Minolta bizhub C364e at 600dpi
Scanner: Fujitsu fi 6800 at 300dpi
Scanner: Lexmark x543 PS at 300dpi

Each document was printed on all printers (3 prints). All the prints were scanned with all scanners resulting in 3*6=18 copies. There are 55 documents which makes a total of 55*18=990 images.

[1] A. Antonacopoulos, D. Bridson, C. Papadopoulos, and S. Pletschacher. A realistic dataset for performance evaluation of document layout analysis. In Proc. of 10th International Conference on Document Analysis and Recognition (ICDAR), pages 296–300. IEEE, 2009.

The dataset has a size of 4.2GB and is hosted on an FTP server of the University of La Rochelle. Please contact "mluqma01(at)univ-lr(dot)fr" to have access to it.