Outliers and anomalies in training and testing datasets for

Short Intro
In the world of medical imaging, AI can quickly measure the shape and size of organs. But it only works well if the data is clean. In a recent Frontiers study, researchers used CT scans of the spleen to show that outliers and odd cases can throw off AI models. They also shared steps to find and fix these problems so machines get better at their job.

3 Takeaways
• Outliers in CT data—like artifacts or rare pathologies—can mislead AI models used for organ measurement.
• Automated and manual checks, including simple shape and intensity filters, help spot these anomalies early.
• Cleaning and curating training and test sets boosts the accuracy and reliability of AI morphometry tools.

Main Story

Background on AI morphometry
We often rely on CT scans to measure organs. Doctors use morphometry to understand disease, track growth, or plan surgeries. AI can automate this work by learning patterns in large datasets. However, AI depends on the quality of its data. If the scans contain errors or odd cases, the model may learn the wrong lessons. This study focuses on how outliers and anomalies affect AI that measures the spleen.

Why the spleen?
The spleen varies a lot in size and shape. It can swell (splenomegaly) or shrink in disease. It may sit close to other organs or have lesions. All these factors make spleen segmentation a tough test. The team gathered over 1,000 abdominal CT scans from multiple hospitals. The scans came in different resolutions, contrast settings, and slice thicknesses. This diversity is good for training AI but also raises the risk of strange cases.

Finding outliers and anomalies
Outliers are data points that don’t follow the normal pattern. An anomaly is a specific kind of outlier that could be due to an error or an unusual case. In CT data, these might include:
• Motion artifacts when the patient moves.
• Metal artifacts from surgical clips or implants.
• Uncommon contrast timing that makes organs look too dark or bright.
• Cases with extreme pathology, such as huge tumors or severe atrophy.
• Anatomical variants, like a wandering spleen or splenic flexure variation.

To spot these problems, the researchers used both manual and automated steps:
1. Visual inspection: Experts looked at random scans and found odd cases. This step is simple but not scalable.
2. Statistical filters: They calculated basic metrics. For example, the spleen volume should fall within a known range. If it’s too large or small, they flagged it.
3. Intensity checks: CT values (Hounsfield units) for spleen tissue usually sit within a narrow band. Values outside that range could mean a bad scan or wrong labeling.
4. Shape analysis: The spleen should be a smooth, curved shape. Sharp edges or holes often point to artifacts.
5. Deep anomaly detectors: The team trained a small neural network to learn what a “normal” spleen segmentation looked like. If a scan deviated too far from this learned pattern, it got flagged.

Results of outlier detection
In the training set, about 7% of scans had at least one anomaly. The most common issues were metal artifacts and extreme splenomegaly. In the test set, around 5% of scans were problematic. If these went unnoticed, the model’s segmentation error jumped by up to 15% in some cases.

Impact on AI model performance
The team trained a standard U-Net segmentation model on three versions of the dataset:
• Raw data with no cleaning.
• Data cleaned by manual review only.
• Data cleaned by a combined manual and automated pipeline.

They measured performance by the Dice Similarity Coefficient (DSC), a common metric for segmentation. The raw data model scored an average DSC of 0.89. Manual cleaning lifted the score to 0.91. The full pipeline pushed it to 0.94. Not only did the final model perform better on clean data, but it also generalized better to new scans from different hospitals.

Why human + automated checks matter
Manual review catches odd cases that algorithms might miss. But it takes time and can miss rare anomalies if reviewers get tired. Automated checks can scan thousands of images fast and alert reviewers to suspicious cases. Combining these approaches keeps human experts where they add the most value. It also frees them from hunting obvious errors.

Best practices and recommendations
Based on their findings, the researchers outline a clear workflow:
1. Gather diverse data from multiple sites to capture real-world variation.
2. Run basic statistical checks on volumes, intensities, and shapes.
3. Apply automated anomaly detection models tuned to your organ of interest.
4. Have experts review the flagged cases and decide whether to fix or remove them.
5. Retrain the AI model on the curated dataset and compare performance.
6. Repeat checks on new data before deploying your model.

This process takes extra work up front. But cleaner data pays off in more trustworthy AI. It also speeds up deployment, since models trained on messy data often break in real clinics.

3-Q FAQ
Q1. What counts as an outlier in medical imaging data?
A1. An outlier is any scan or measurement that differs significantly from typical values. In CT, this can mean odd volumes, unusual intensity levels, or strange shapes caused by errors, artifacts, or rare conditions.

Q2. How do automated anomaly detectors work?
A2. They learn normal patterns from clean examples. When a new scan deviates too much—based on statistical or learned features—the system flags it for review. Common methods include autoencoders and one-class classifiers.

Q3. Can this process apply to other organs or imaging types?
A3. Absolutely. While the study focuses on the spleen in CT, the same ideas work for MRI, ultrasound, or X-rays. You just need to adapt the statistical checks and anomaly models to your organ and imaging modality.

Call to Action
Ready to clean up your datasets and boost your AI’s performance? Dive deeper into the full Frontiers article for step-by-step details and code examples. Join our newsletter to get the latest tips on AI in medical imaging and be the first to learn about new data-quality tools.

Outliers and anomalies in training and testing datasets for AI-powered morphometry -Evidence from CT scans of the spleen – Frontiers

Comments

Leave a Reply Cancel reply