Scientific Research in Trustworthy Artificial Intelligence for Medicine

Authors: Anxhelo Shehu, Kleida Mati, and René Natowicz – Journal of Clinical Oncology (Ranked Q1 journal)

Breast cancer detection is not a one-dimensional problem; it is inherently multidimensional. The decision of whether a mammogram conceals a malignancy (cancer) is influenced by dozens of factors: age, family history, hormonal status, tissue morphology, image quality, and many others. Yet within this complex space of variables, two dimensions emerge as the most critical.

The first dimension is the probability that cancer is present how likely it is, for a given patient, that the disease exists. The second dimension is the difficulty of identifying it how opaque the mammogram is to the radiologist’s eye, and how easily a tumor can remain hidden within the surrounding tissue.

If these two dimensions were independent, each case would simply occupy a point within the plane they define, and only a small fraction would fall into the dangerous corner where both are simultaneously high. But they are not independent. There is a single factor that simultaneously constrains both axes, pushing cases precisely along the 45◦ diagonal where risk and diagnostic blindness coincide: breast density.

Dense fibroglandular tissue simultaneously increases the likelihood that a patient will develop cancer — high density is an independent risk factor, associated with a one- to six-fold increase compared with predominantly fatty breasts — and reduces mammographic sensitivity, because dense tissue appears bright, just like many tumors, thereby masking them. Breast density is therefore not merely one factor among many; it is the junction where the two most critical dimensions of the problem intersect. A high-density case is, at the same time, a high-cancer-probability case and a difficult-to-classify case — precisely the combination that a screening system cannot afford to overlook.

For this reason, we consider breast density assessment to be one of the fundamental steps in cancer detection. Accurate and reproducible evaluation is not a secondary technical detail; it is the prerequisite for understanding where a patient lies along that diagonal and, consequently, how aggressive surveillance should be, whether supplemental imaging (ultrasound, MRI) is required, and how much confidence should be placed in a “negative” mammogram.

How Breast Density Is Measured Today — and Why This Is a Problem

In clinical practice, radiologists assign each breast to one of four BI-RADS categories: A, B, C, or D. However, this assessment remains subjective, with substantial variability between radiologists and even within the same radiologist over time. This is precisely where automated systems come into play. The problem is that nearly all such systems today rely on convolutional neural networks (CNNs) with hundreds of thousands to millions of parameters. They are powerful, but they are also “black boxes”: we do not know which image characteristics drive their decisions, making them difficult to explain, reproduce, and validate — precisely the three requirements for trustworthy artificial intelligence in healthcare.

Our paper, published in Journal of Clinical Oncology Q1, asks a simple yet provocative question: Are CNNs the only path forward?

How Does It Work?

Our answer begins with the qualitative BI-RADS definitions themselves and translates them into a small set of quantitative indicators. Fatty tissue appears dark on a mammogram, whereas dense fibroglandular tissue appears bright. This simple observation allowed us to construct four interpretable indicators:

Indicator M1 — the bimodality of the pixel-intensity histogram. To achieve this, we had to define an entirely new bimodality metric that makes no assumptions about the shape of the distribution. This indicator alone separates A–B breasts from C–D breasts with high accuracy. It also naturally revealed four brightness bands: very dark, dark, bright, and very bright.
Indicators M2, M3, and M4 — the “heterogeneity” described in BI-RADS category C was translated into Shannon entropy computed within these brightness bands. Three additional indicators emerged, each strongly associated with different subsets of the BI-RADS categories.

The four indicators were then fed into an extremely small deep-learning classifier: four input neurons (one per indicator), a single hidden layer with 16 neurons, and task-specific outputs. In total, only 256 parameters were required for A/B/C/D prediction — compared with the millions of parameters typically found in CNNs.

Why Does It Matter?

Because the network does not “discover” hidden features directly from raw images, as CNNs do. The features were explicitly chosen by us and derive directly from radiologists’ definitions. As a result, training is extremely fast, robustness evaluation through cross-validation takes only a few minutes, and every decision can be explained step by step. This means explainability and reproducibility are not promises, but properties built directly into the design itself.

And all of this is achieved without sacrificing performance. On 7,622 images from the public VinDr-Mammo database:

A–B versus C–D: Accuracy 94.25%, F1-score 93.86%, AUC 94.82%
A/B/C/D (all four categories): Accuracy 83.18%, F1-score 81.40%, AUC 89.57%

These results are on par with the best CNN-based designs, while being evaluated on a substantially larger dataset than most prior studies.

Where Can It Be Applied?

Standardizing breast density assessment across institutions, reducing inter-radiologist variability.
Supporting patient stratification for supplemental screening pathways (ultrasound, MRI).
Deployment in resource-limited environments or high-volume screening settings, since it runs on standard computers without requiring specialized hardware.
Providing potential external biomarkers for future breast density or cancer-risk assessment models.

This work is the result of a collaboration between researchers from Metropolitan University Tirana, American Hospital (Tirana), and ESIEE Paris — Université Gustave Eiffel. I would like to thank the Embassy of France in Albania for its support.

All source code is publicly available to ensure reproducibility:

https://github.com/AI-Lab-UMT/Breast_Density

Journal of Clinical Oncology:

https://doi.org/10.1200/JCO.2026.44.16_suppl.e12558

Scientific Research in Trustworthy Artificial Intelligence for Medicine

How Breast Density Is Measured Today — and Why This Is a Problem

How Does It Work?

Why Does It Matter?

Scientific Research in Trustworthy Artificial Intelligence for Medicine

Internationally Impactful Research in Intelligent Robotics

Albania’s 5 Finalists for IEO 2026

Over 40 Companies at the 2026 Job Fair: UMT Connects Students with the Labor Market

RRETH UMT

Mjediset dhe laboratorët

Alumni

Përshëndetja e Presidentit

Mundësi Punësimi

Statuti

Politika e Privatësisë

PRANIMET

Kriteret e pranimeve

Tarifat e studimit

Bursat e studimit

Transferimet e studimeve

Apliko tani

FAKULTETET

Fakulteti i Shkencave Kompjuterike dhe IT

Fakulteti i Ekonomisë

Fakulteti i Inxhinierisë dhe Arkitektures

Na ndiqni në rrjetet sociale:

UMT TIRANE

© All Right Reserved 2025, University Metropolitan Tirana