About this catalog
Mission
The catalog is stewarded collaboratively by collaborators from IAPR TC10 and TC11 (Intelligent Reading Systems), for the broader Document Image Analysis and Recognition (DIAR) community.
- TC10 emphasises interoperability among researchers working at the intersection of document imaging and graphical pattern recognition.
- TC11 emphasises architectures, corpora and evaluation for machine reading pipelines.
Together the committees aggregate pointers so researchers can relocate datasets, tooling, evaluations and reproducibility artefacts efficiently.
Where this prototype came from
Maintainers experimented with statically generated hubs on GitHub Pages to reduce friction submitting datasets and ancillary material while narrowing security exposure compared with classic dynamic CMS stacks. Read our September 2021 motivation note describing the rationale (adapted wording from the inaugural Jekyll era — the generator stack is Eleventy today).
Contributors to this experiment include Joseph Chazalon, Pau Riba, and Muzzamil Luqman.
Interested in collaborating? Visit Feedback — issues, curated survey links, mailing list hints — or mention desired scope in a GitHub issue.
Scope
Artifacts fall into controlled kinds:
| Kind | Highlights |
|---|---|
| Datasets | Ground truth, manifests, mirrored downloads, integrity hashes (md5 / optional sha256). |
| Software & tooling | Evaluation kits, preprocessing libraries, annotated-tooling front-ends (/software/ summarizes navigation). |
| Pre-trained models | Canonical weights plus model-cards & provenance anchors. |
| Competitions | Challenges, leaderboard pointers, organisers, artefacts. |
The catalog indexes artefacts on Zenodo, Hugging Face Hub, institutional SFTP buckets, institutional Git servers, &c. — blobs do not live exclusively inside Pages output.
Standalone academic papers generally appear as downloads referencing an entry—not as lone catalog records.
Curation lifecycle
- Contributors either open pull requests, submissions guided in-browser, curated Issues (datasets), or email
tc101-demo(at)googlegroups.comfor heavy attachments. - Continuous integration asserts JSON Schema validity for Markdown front matter and ancillary cross-check scripts.
- A TC curator reviews scope, taxonomy fit, mirror health, licences, duplication.
- Link monitoring is manual / workflow dispatched by maintainers (not nightly automation today).
Further historical workflow commentary lives in the submission guide.
Technology
The static site ships via Eleventy. Client-side search uses Pagefind with a supplementary JSON /search-index.json powering fallback filtering on Browse.
Source hub: https://github.com/TC101-demo/TC101-demo.github.io.
Acknowledgements
Thanks are due to organisers of longstanding TC11 infrastructures and dataset hosting communities—especially collaborators seeding IAPR artefacts on Zenodo’s IAPR-TC11 community collection.