Improving the Predictive Analytics of Machine-Learning Pipelines for Bridge Infrastructure Asset Management Applications: An Upstream Data Workflow to Address Data Quality Issues in the National Bridge Inventory Database

Research output: Contribution to journalArticlepeer-review

Abstract

The increasing availability of bridge data from the National Bridge Inventory (NBI) offers a great opportunity to perform predictive analytics (such as bridge deterioration prediction) using machine learning (ML) pipelines for supporting bridge asset management. However, data quality issues (e.g., outliers and missing values) can significantly affect ML pipelines, requiring upstream tasks to be performed for ensuring the validity, applicability, and generalizability of pipelines. Among the tasks, outlier removal and missing value imputation are the most challenging due to a highly laborious process, a lack of data governance, and a mixture of heterogenous data quality issues and data types. To address this challenge, this paper proposes an upstream workflow for enhancing the downstream predictive analytics of bridge-related ML pipelines. The proposed upstream workflow was developed based on the NBI data collected for all States in the United States, which includes a total of 617,084 observations/bridges. Existing bridge domain knowledge from multiple sources (such as the bridge design manual and regulations) was leveraged to remove outliers. Then, this study applied and compared 10 statistical and ML-based data imputation techniques to impute missing values. Statistical analysis and imputation evaluation of NBI data indicated that: (1) 19 and 15 out of the total 38 frequently used features or variables had outliers and missing values, respectively; (2) categorical features are generally more prone to data dropping due to inapplicable values, while numeric features are more subjected to outliers; and (3) ML-based data imputation is more suitable than statistical imputation for both numeric and categorical features, especially for features with high missing rate. The proposed workflow was validated on its capability of improving downstream predictive analytics for bridge deck condition prediction, increasing the balanced accuracy by 6.85%-9.76%. This paper contributes to the body of knowledge by offering a novel upstream workflow that can be utilized as a benchmark for guiding researchers and bridge engineering practitioners to handle NBI data quality issues for better preforming predictive analytics using ML pipelines.

Original languageEnglish (US)
Article number04023103
JournalJournal of Bridge Engineering
Volume29
Issue number1
DOIs
StatePublished - Jan 1 2024

All Science Journal Classification (ASJC) codes

  • Civil and Structural Engineering
  • Building and Construction

Fingerprint

Dive into the research topics of 'Improving the Predictive Analytics of Machine-Learning Pipelines for Bridge Infrastructure Asset Management Applications: An Upstream Data Workflow to Address Data Quality Issues in the National Bridge Inventory Database'. Together they form a unique fingerprint.

Cite this