TY - GEN
T1 - Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence
AU - Jose, Sharu Theresa
AU - Simeone, Osvaldo
N1 - Publisher Copyright:
© 2021 European Signal Processing Conference. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture (i) the domain shift between the target data distribution PZ0 and the source distribution PZ through a two-parameter family of generalized (α1, α2)-Jensen-Shannon (JS) divergences; and (ii) the sensitivity of the transfer learner output W to each individual sample of the data set Zi via the mutual information I(W; Zi). For α1 ∈ (0, 1), the (α1, α2)JS divergence can be bounded even when the support of PZ is not included in that of PZ0. This contrasts the Kullback-Leibler (KL) divergence DKL(PZ||PZ0)-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the φ-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the (α1, α2)-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.
AB - In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture (i) the domain shift between the target data distribution PZ0 and the source distribution PZ through a two-parameter family of generalized (α1, α2)-Jensen-Shannon (JS) divergences; and (ii) the sensitivity of the transfer learner output W to each individual sample of the data set Zi via the mutual information I(W; Zi). For α1 ∈ (0, 1), the (α1, α2)JS divergence can be bounded even when the support of PZ is not included in that of PZ0. This contrasts the Kullback-Leibler (KL) divergence DKL(PZ||PZ0)-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the φ-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the (α1, α2)-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.
UR - http://www.scopus.com/inward/record.url?scp=85123198986&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123198986&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO54536.2021.9616270
DO - 10.23919/EUSIPCO54536.2021.9616270
M3 - Conference contribution
AN - SCOPUS:85123198986
T3 - European Signal Processing Conference
SP - 1461
EP - 1465
BT - 29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 29th European Signal Processing Conference, EUSIPCO 2021
Y2 - 23 August 2021 through 27 August 2021
ER -