An improved data anonymization algorithm for incomplete medical dataset publishing

Wei Liu, Mengli Pei, Congcong Cheng, Wei She, Chase Q. Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd International Conference on Healthcare Science and Engineering
EditorsXianxian Li, Chase Q. Wu, Ming-Chien Chyu, Jaime Lloret
PublisherSpringer Verlag
Pages115-128
Number of pages14
ISBN (Print)9789811368363
DOIs
StatePublished - 2019
Event2nd International Conference on Healthcare Science and Engineering, Healthcare 2018 - Guilin, China
Duration: Jun 1 2018Jun 3 2018

Publication series

NameLecture Notes in Electrical Engineering
Volume536
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference2nd International Conference on Healthcare Science and Engineering, Healthcare 2018
Country/TerritoryChina
CityGuilin
Period6/1/186/3/18

All Science Journal Classification (ASJC) codes

  • Industrial and Manufacturing Engineering

Keywords

  • Data anonymization
  • Incomplete medical dataset
  • L-diversity
  • Missing data

Fingerprint

Dive into the research topics of 'An improved data anonymization algorithm for incomplete medical dataset publishing'. Together they form a unique fingerprint.

Cite this