TY - GEN
T1 - An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning
AU - Jose, Sharu Theresa
AU - Simeone, Osvaldo
N1 - Funding Information:
The authors are with King’s Communications, Learning, and Information Processing (KCLIP) lab at the Department of Engineering of Kings College London, UK (emails: sharu.jose@kcl.ac.uk, osvaldo.simeone@kcl.ac.uk). The authors have received funding from the European Research Council (ERC) under the European Unions Horizon 2020 Research and Innovation Programme (Grant Agreement No. 725731).
Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/12
Y1 - 2021/7/12
N2 - Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.
AB - Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical loss measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.
UR - http://www.scopus.com/inward/record.url?scp=85102671229&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102671229&partnerID=8YFLogxK
U2 - 10.1109/ISIT45174.2021.9517767
DO - 10.1109/ISIT45174.2021.9517767
M3 - Conference contribution
AN - SCOPUS:85102671229
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1534
EP - 1539
BT - 2021 IEEE International Symposium on Information Theory, ISIT 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Symposium on Information Theory, ISIT 2021
Y2 - 12 July 2021 through 20 July 2021
ER -