TY - GEN
T1 - Efficient and scalable data evolution with column oriented databases
AU - Liu, Ziyang
AU - He, Bin
AU - Hsiao, Hui I.
AU - Chen, Yi
PY - 2011
Y1 - 2011
N2 - Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes.
AB - Database evolution is the process of updating the schema of a database or data warehouse (schema evolution) and evolving the data to the updated schema (data evolution). It is often desired or necessitated when changes occur to the data or the query workload, the initial schema was not carefully designed, or more knowledge of the database is known and a better schema is concluded. The Wikipedia database, for example, has had more than 170 versions in the past 5 years [8]. Unfortunately, although much research has been done on the schema evolution part, data evolution has long been a prohibitively expensive process, which essentially evolves the data by executing SQL queries and re-constructing indexes. This prevents databases from being flexibly and frequently changed based on the need and forces schema designers, who cannot afford mistakes, to be highly cautious. Techniques that enable efficient data evolution will undoubtedly make life much easier. In this paper, we study the efficiency of data evolution, and discuss the techniques for data evolution on column oriented databases, which store each attribute, rather than each tuple, contiguously. We show that column oriented databases have a better potential than traditional row oriented databases for supporting data evolution, and propose a novel data-level data evolution framework on column oriented databases. Our approach, as suggested by experimental evaluations on real and synthetic data, is much more efficient than the query-level data evolution on both row and column oriented databases, which involves unnecessary access of irrelevant data, materializing intermediate results and re-constructing indexes.
KW - Bitmap index
KW - Column oriented database
KW - Data evolution
KW - Schema
UR - http://www.scopus.com/inward/record.url?scp=79953841370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953841370&partnerID=8YFLogxK
U2 - 10.1145/1951365.1951380
DO - 10.1145/1951365.1951380
M3 - Conference contribution
AN - SCOPUS:79953841370
SN - 9781450305280
T3 - ACM International Conference Proceeding Series
SP - 105
EP - 116
BT - Advances in Database Technology - EDBT 2011
PB - Association for Computing Machinery
T2 - 14th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2011
Y2 - 22 March 2011 through 24 March 2011
ER -