Consider a device that is connected to an edge processor via a communication channel. The device holds local data that is to be offloaded to the edge processor so as to train a machine learning model, e.g., for regression or classification. Transmission of the data to the learning processor, as well as training based on stochastic gradient descent (SGD), must be both completed within a time limit. Assuming that communication and computation can be pipelined, this letter investigates the optimal choice for the packet payload size, given the overhead of each data packet transmission and the ratio between the computation and the communication rates. This amounts to a tradeoff between bias and variance, since communicating the entire data set first reduces the bias of the training process but it may not leave sufficient time for learning. Analytical bounds on the expected optimality gap are derived so as to enable an effective optimization, which is validated in numerical results.
All Science Journal Classification (ASJC) codes
- Modeling and Simulation
- Computer Science Applications
- Electrical and Electronic Engineering
- Machine learning
- mobile edge computing
- stochastic gradient descent