TY - JOUR
T1 - A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals
AU - Genoud, Adrien P.
AU - Gao, Yunpeng
AU - Williams, Gregory M.
AU - Thomas, Benjamin P.
N1 - Funding Information:
Research reported in this publication was supported by the National Institutes of Health [award number R03AI138133 ]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health .
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/7
Y1 - 2020/7
N2 - The surveillance of mosquito populations is paramount in the fight against mosquito-borne diseases that affect millions of people every year. Evaluating the efficiency of mitigation methods requires extensive and long-term surveys which can be costly and time consuming. The recent development of optical sensors give access to alternative methods for entomological monitoring but require efficient classification algorithms to be successful. In this contribution, supervised machine learning algorithms such as Linear Discriminant Analysis, Decision Trees, Support Vector Machine, K-Nearest Neighbors and Naïve Bayes are compared for the identification of mosquitoes through optical signals. Based on predictor variables derived from the wing beat frequency and optical cross section of mosquitoes, these algorithms were trained to perform different classification tasks: the identification of species, sex and/or gravidity of mosquitoes present in New Jersey, USA. Results shows that the most polyvalent machine learning algorithm for mosquito identification is Support Vector Machine that performs on average over all tasks between 0.65 and 7.3% better than other algorithms. Moreover, Support Vector Machine is the algorithm that best performed for the most complex tasks, more than 2% above the second best, it is therefore the most suited for real-world study where several species of mosquito can be expected at a single location. A close second is Linear Discriminant Analysis that is only outperformed by Support Vector Machine by 0.65% over all tasks and is the most performant when studying mosquito gravidity. Finally, Decision Trees algorithm has reached almost perfection in identifying the sex of a single mosquito species with 99.9% accuracy which is 1.3% more than the second best performing algorithm on this task. These results demonstrate that optical sensors, coupled with machine learning, can be a viable alternative or complementary methodology for the monitoring of mosquito populations. Furthermore, this methodology can perform non-intrusive, automatic, time-resolved measurements of insect population dynamics over extended periods of time without the need for laboratory analysis of captured specimens as in most traditional survey methods.
AB - The surveillance of mosquito populations is paramount in the fight against mosquito-borne diseases that affect millions of people every year. Evaluating the efficiency of mitigation methods requires extensive and long-term surveys which can be costly and time consuming. The recent development of optical sensors give access to alternative methods for entomological monitoring but require efficient classification algorithms to be successful. In this contribution, supervised machine learning algorithms such as Linear Discriminant Analysis, Decision Trees, Support Vector Machine, K-Nearest Neighbors and Naïve Bayes are compared for the identification of mosquitoes through optical signals. Based on predictor variables derived from the wing beat frequency and optical cross section of mosquitoes, these algorithms were trained to perform different classification tasks: the identification of species, sex and/or gravidity of mosquitoes present in New Jersey, USA. Results shows that the most polyvalent machine learning algorithm for mosquito identification is Support Vector Machine that performs on average over all tasks between 0.65 and 7.3% better than other algorithms. Moreover, Support Vector Machine is the algorithm that best performed for the most complex tasks, more than 2% above the second best, it is therefore the most suited for real-world study where several species of mosquito can be expected at a single location. A close second is Linear Discriminant Analysis that is only outperformed by Support Vector Machine by 0.65% over all tasks and is the most performant when studying mosquito gravidity. Finally, Decision Trees algorithm has reached almost perfection in identifying the sex of a single mosquito species with 99.9% accuracy which is 1.3% more than the second best performing algorithm on this task. These results demonstrate that optical sensors, coupled with machine learning, can be a viable alternative or complementary methodology for the monitoring of mosquito populations. Furthermore, this methodology can perform non-intrusive, automatic, time-resolved measurements of insect population dynamics over extended periods of time without the need for laboratory analysis of captured specimens as in most traditional survey methods.
KW - Entomology
KW - Lidar
KW - Machine learning
KW - Mosquito
KW - Remote sensing
UR - http://www.scopus.com/inward/record.url?scp=85083716343&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083716343&partnerID=8YFLogxK
U2 - 10.1016/j.ecoinf.2020.101090
DO - 10.1016/j.ecoinf.2020.101090
M3 - Article
AN - SCOPUS:85083716343
SN - 1574-9541
VL - 58
JO - Ecological Informatics
JF - Ecological Informatics
M1 - 101090
ER -