Dataset ICDAR 2019
ICDAR 2019 Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection
Downloads
| File | Type | Size | Mirrors |
|---|---|---|---|
|
TC11_package_CROHME2019.zip
Zip file with data, tools and papers
|
other | 364 MB | Mirror 1 |
Description
This package provides training and test data from the competitions CROHME 2011, 2012, 2013, 2014, 2016 and 2019.
Dataset Information
This package bundles the CROHME training and test data accumulated across several editions of the competition together with the typeset formula detection (TDF) extension introduced in the 2019 edition.
Ground Truth
Math expression for online and off handwriting
The ground-truth is available in INKML format (with latex string and mathml structure), in Stroke Label Graph (SLG, associating stroke to ground-truth) and Object Layout Graph (symbol layout tree independent of the strokes). These ground-truth allows training and evaluation for on-line and off-line recognition tasks.
Typeset Formula Detection
Using the ground-truth from the GTDB datasets, the math expressions are located in a set of scientific documents.
Research Tasks
Online Handwritten Formula Recognition
For the traditional task in CROHME, participants must convert a list of handwritten strokes captured as a list of polylines from a tablet or similar devices to a Symbol Layout Tree (SLT). This SLT captures the segmentation of strokes into symbols, symbol classification, and the spatial relationships between symbols. SLTs are represented using labeled directed graphs, so that all segmentation, classification, and relationship (parsing) errors can be automatically identified and compiled using tools developed for CROHME (CROHMELib and LgEval).
Offline Handwritten Formula Recognition
For offline recognition of handwritten inputs, we render images from the (x,y) points in the CROHME InkML files. As in the previous task, for a given test image, participating systems must produce one .lg file.
Detection of Formulas in Document Pages
In this task, for a given document page, participating systems identify the location of formulas using bounding boxes. Evaluation is done by calculating the intersection over union (IoU) with the groundtruth annotations.