
Awesome Lung CT Datasets
A curated, open-source collection of 15+ publicly available lung CT datasets with segmentation annotations — designed to help researchers quickly find the right data for medical image analysis.
Why This Project?
If you've ever worked on a medical imaging research project, you know the pain: finding the right public dataset is harder than it should be. Datasets are scattered across TCIA, Zenodo, Grand Challenge, Mendeley, and random university pages. Licenses vary, formats differ, and annotations come in all shapes.
I built Awesome Lung CT Datasets to solve this problem — a single, well-organized reference for every publicly available lung CT dataset with segmentation annotations. Whether you're working on nodule detection, COVID-19 lesion segmentation, airway extraction, or multi-organ segmentation, this repository gets you to the data in seconds.
What's Inside
The repository currently indexes 15+ datasets covering 100,000+ CT scans across five categories.
Scan Count Distribution
The following chart shows the number of scans available in each dataset. Note the logarithmic scale — NLST alone contains over 75,000 scans, while specialized datasets like AeroPath focus on quality with 27 carefully annotated volumes.
General Lung Segmentation
| Dataset | Scans | Annotations | Format |
|---|---|---|---|
| LIDC-IDRI | 1,018 | Nodule annotations by 4 radiologists | DICOM + XML |
| LUNA16 | 888 | Lung nodules ≥3mm | DICOM |
| NSCLC Radiogenomics | 211 | Tumor segmentations + genomic data | DICOM |
| Medical Decathlon Lung | 96 | Lung tumor segmentations | NIfTI |
COVID-19 Lung CT
| Dataset | Scans | Annotations | Format |
|---|---|---|---|
| COVID-19 CT Seg Dataset 1 | 100 slices | Ground-glass, consolidation, effusion | NIfTI |
| COVID-19 CT Seg Dataset 2 | 829 slices | COVID-19 lesions | NIfTI |
| COVID-19 CT Lung & Infection | 20 volumes | Left/right lung + infections | NIfTI |
Thoracic Organ Segmentation
| Dataset | Scans | Annotations | Format |
|---|---|---|---|
| SegTHOR | 60 | Heart, trachea, aorta, esophagus | NIfTI |
| TotalSegmentator | 1,228 | 117+ anatomical structures | NIfTI |
| LCTSC 2017 | 60 | Lung, heart, spinal cord, esophagus | DICOM |
Airway Segmentation
| Dataset | Scans | Annotations | Format |
|---|---|---|---|
| ATM'22 | 500 | Full airway tree + centerlines | NIfTI |
| AeroPath'23 | 27 | Trachea + bronchi (challenging pathology) | NIfTI |
| AIIB23 | 285 | ILD masks + airway-informed biomarkers | NIfTI |
Specialized Datasets
| Dataset | Scans | Annotations | Format |
|---|---|---|---|
| LUNG-PET-CT-DX | 355 | Tumor bounding boxes | DICOM + XML |
| NLST | 75,000+ | Low-dose screening scans | DICOM |
Annotation Coverage
The datasets span multiple annotation types essential for different research tasks. Pixel-level segmentation dominates, but the collection also covers bounding boxes, multi-organ labels, airway trees with centerlines, and multi-radiologist consensus annotations.
- Pixel-level segmentation: LIDC-IDRI, COVID-19 datasets, TotalSegmentator
- Bounding boxes: LUNG-PET-CT-DX
- Multi-organ labels: SegTHOR, TotalSegmentator (117+ structures)
- Multi-radiologist consensus: LIDC-IDRI (4 expert annotations per nodule)
- Airway trees with centerlines: ATM'22
Research Impact
Based on publication counts, the datasets in this collection have been used in thousands of research papers. LIDC-IDRI alone appears in over 1,000 publications, making it the most widely used lung CT dataset in the world.
- LIDC-IDRI — 1,000+ publications
- LUNA16 — 500+ publications
- Medical Decathlon — 200+ publications
- COVID-19 Segmentation — 150+ publications
Who Is This For?
- PhD students starting a new lung imaging project and need data fast
- Researchers benchmarking their segmentation models across multiple datasets
- Engineers building medical AI products and looking for training data
- Radiologists interested in open datasets for validation studies
Contributing
The repository is open source under the MIT license. If you know of a public lung CT dataset that's missing, you can submit a pull request to add it. Contributions welcome!
Check out the full repository on GitHub — and star it if it helps your research!