Efficient and scalable data evolution with column oriented databases

Ziyang Liu, Bin He, Hui I. Hsiao, Yi Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2011
Subtitle of host publication14th International Conference on Extending Database Technology, Proceedings
Pages105-116
Number of pages12
DOIs
StatePublished - 2011
Externally publishedYes
Event14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011 - Uppsala, Sweden
Duration: Mar 22 2011Mar 24 2011

Publication series

NameACM International Conference Proceeding Series

Other

Other14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
CountrySweden
CityUppsala
Period3/22/113/24/11

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • Bitmap index
  • Column oriented database
  • Data evolution
  • Schema

Fingerprint Dive into the research topics of 'Efficient and scalable data evolution with column oriented databases'. Together they form a unique fingerprint.

Cite this