TY - GEN
T1 - Efficient Local Intrinsic Dimensionality Estimation in Evolving Deep Representations
AU - Houle, Michael E.
AU - Oria, Vincent
AU - Xu, Hao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Local intrinsic dimensionality (LID) provides insight into the behavior of individual training points in deep neural networks, with applications including adversarial detection, prevention of dimensional collapse in self-supervised learning, and identification of untruthful responses from large language models (LLMs). In such contexts, efficient LID estimation has depended on the use of mini-batches, due to the high cost of computing neighborhoods in latent space. However, estimation with respect to small subsets of the training data usually reflects the dimensionality of the global manifold structure rather than the intended local distribution around each point. In this paper, we propose the Nearest Distance Cache (NDC), a method that improves the locality of LID estimation by reusing nearest-neighbor distances observed in past mini-batches. This strategy faces two key challenges: representations evolve over time, and limited memory prevents storing all past distances. To address these, NDC maintains a compact cache of nearest distances per example and uses window-based change detection to discard outdated samples affected by distributional drift. We also evaluate NDC on two tasks: an autoencoder trained on synthetic data with known ground-truth LID, and a ResNet trained on CIFAR-10. Results show that NDC captures local properties of deep representations not revealed by single mini-batch estimates.
AB - Local intrinsic dimensionality (LID) provides insight into the behavior of individual training points in deep neural networks, with applications including adversarial detection, prevention of dimensional collapse in self-supervised learning, and identification of untruthful responses from large language models (LLMs). In such contexts, efficient LID estimation has depended on the use of mini-batches, due to the high cost of computing neighborhoods in latent space. However, estimation with respect to small subsets of the training data usually reflects the dimensionality of the global manifold structure rather than the intended local distribution around each point. In this paper, we propose the Nearest Distance Cache (NDC), a method that improves the locality of LID estimation by reusing nearest-neighbor distances observed in past mini-batches. This strategy faces two key challenges: representations evolve over time, and limited memory prevents storing all past distances. To address these, NDC maintains a compact cache of nearest distances per example and uses window-based change detection to discard outdated samples affected by distributional drift. We also evaluate NDC on two tasks: an autoencoder trained on synthetic data with known ground-truth LID, and a ResNet trained on CIFAR-10. Results show that NDC captures local properties of deep representations not revealed by single mini-batch estimates.
KW - Deep Representations
KW - Distributional Drift Detection
KW - Local Intrinsic Dimensionality
KW - Nearest Distance Cache
UR - https://www.scopus.com/pages/publications/105020383201
UR - https://www.scopus.com/pages/publications/105020383201#tab=citedBy
U2 - 10.1007/978-3-032-06069-3_4
DO - 10.1007/978-3-032-06069-3_4
M3 - Conference contribution
AN - SCOPUS:105020383201
SN - 9783032060686
T3 - Lecture Notes in Computer Science
SP - 41
EP - 55
BT - Similarity Search and Applications - 18th International Conference, SISAP 2025, Proceedings
A2 - Amato, Giuseppe
A2 - Mic, Vladimir
A2 - Traina, Agma
A2 - Messina, Nicola
A2 - Amsaleg, Laurent
A2 - Þór Guðmundsson, Gylfi
A2 - Þór Jónsson, Björn
A2 - Vadicamo, Lucia
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on Similarity Search and Applications, SISAP 2025
Y2 - 1 October 2025 through 3 October 2025
ER -