TY - GEN
T1 - Context-Aware Search and Retrieval Over Erasure Channels
AU - Ghasvarianjahromi, Sara
AU - Yakimenka, Yauhen
AU - Kliewer, Jörg
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper introduces and analyzes a search and retrieval model that adopts key semantic communication principles from retrieval-augmented generation. We specifically present an information-theoretic analysis of a remote document retrieval system operating over a symbol erasure channel. The proposed model encodes the feature vector of a query, derived from term-frequency weights of a language corpus by using a repetition code with an adaptive rate dependent on the contextual importance of the terms. At the decoder, we select between two documents based on the contextual closeness of the recovered query. By leveraging a jointly Gaussian approximation for both the true and reconstructed similarity scores, we derive an explicit expression for the retrieval error probability, i.e., the probability under which the less similar document is selected. Numerical simulations on synthetic and real-world data (Google NQ) confirm the validity of the analysis. They further demonstrate that assigning greater redundancy to critical features effectively reduces the error rate, highlighting the effectiveness of semantic-aware feature encoding in error-prone communication settings.
AB - This paper introduces and analyzes a search and retrieval model that adopts key semantic communication principles from retrieval-augmented generation. We specifically present an information-theoretic analysis of a remote document retrieval system operating over a symbol erasure channel. The proposed model encodes the feature vector of a query, derived from term-frequency weights of a language corpus by using a repetition code with an adaptive rate dependent on the contextual importance of the terms. At the decoder, we select between two documents based on the contextual closeness of the recovered query. By leveraging a jointly Gaussian approximation for both the true and reconstructed similarity scores, we derive an explicit expression for the retrieval error probability, i.e., the probability under which the less similar document is selected. Numerical simulations on synthetic and real-world data (Google NQ) confirm the validity of the analysis. They further demonstrate that assigning greater redundancy to critical features effectively reduces the error rate, highlighting the effectiveness of semantic-aware feature encoding in error-prone communication settings.
UR - https://www.scopus.com/pages/publications/105029036404
UR - https://www.scopus.com/pages/publications/105029036404#tab=citedBy
U2 - 10.1109/ITW62417.2025.11240323
DO - 10.1109/ITW62417.2025.11240323
M3 - Conference contribution
AN - SCOPUS:105029036404
T3 - 2025 IEEE Information Theory Workshop, ITW 2025
BT - 2025 IEEE Information Theory Workshop, ITW 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE Information Theory Workshop, ITW 2025
Y2 - 29 September 2025 through 3 October 2025
ER -