UMBC ebiquity

COVER Model Pivot Index for Flexible, Adaptable, and Agile Systems

Speaker: Fuesane Cheng

Start: Tuesday, November 30, 2010, 09:30AM

End: Tuesday, November 30, 2010, 12:00PM

Location: 325b ITE, UMBC

Abstract: To support corporate business’ competition on speed to market for product and service development, generically modeled data structures have been used in the development of vertical application software systems, and in storing XML and RDF data for its flexibility, adaptability, and agility. However, generic data models require multiple self-joins on a single table with a large volume of data, causing slow performance for business intelligence (BI) applications. Conversely, traditional specific data models have faster performance but are not flexible, adaptive, or agile for speed to market.

A generic data model named the Class Object Value Element Relationship (COVER) model was developed for storing node-oriented tree data information, and is suitable for automated pivot index generation and distributed data processing. This approach utilizes pivot view with appropriate metadata constructs to expose the search predicate fields for indexing, leading to performance gains in data retrieval for queries on branches or leaves across multiple trees. It is beneficial for production support or data retrieval to feed business intelligence and data mining. A distributed COVER model with two different physical implementation variations was also developed for implementation on the distributed and parallel cloud computing platforms to take the scalability and performance advantages of the technology.

Benchmark experiments for comparing the query performance on the COVER model against self-join and XPath/XQuery approaches using RDBMS were executed and proved that the COVER model outperforms the other two on the same sets of test data queries. Furthermore, benchmark experiments on distributed cloud computing environment were conducted using Hadoop HBase for comparing the RDBMS COVER and distributed COVER models. The results have shown that the distributed COVER models outperformed the RDBMS and proved that the distributed COVER model is a viable data storage approach for flexible, adaptable, agile, and scalable systems. :