Query Relaxation for XML Databases

Project Award Number: 0219442

Principal Investigator

First Name Wesley
Middle Initial W.
Last Name Chu
Department: Computer Science Department
Institution: University of California, Los Angeles
Address line 1
Address line 2
City Los Angeles
State CA
Zip Code 90095
Phone Number (310) 825-2047
Fax Number (310) 825-7578
Email wwc@cs.ucla.edu
URL http://www.cs.ucla.edu/~wwc

Collaborator

First Name INEX (Initiative for the Evaluation of XML Retrieval)
Middle Initial
Last Name
Department
Institution
Address line 1
Address line 2
City
State
Zip Code
Phone Number (+49) 203 379 3401
Fax Number (+49) 203 379 2549
Email malik@is.informatik.uni-duisburg.de

URL http://www.is.informatik.uni-duisburg.de/projects/inex03/

Keywords

XML Query Processing, XML Query Relaxation, Knowledge-based cooperative query answering

Project Summary

This research project focuses on the development and implementation of an XML query relaxation enabled search engine. A special query language, termed as RLXQuery (Relaxation-enabled XQuery), that supports XML query relaxation, has been developed. RLXQuery is based on XQuery [1] with constructs that support XML relaxation and relaxation control. A knowledge-based relaxation index (X-TAH – XML Type Abstraction Hierarchy) was developed to provide systematic and scalable XML query relaxation and to guide the relaxation process. X-TAH has a hierarchical cluster where similar objects are grouped together based on inter-object distance and inter-cluster distance. An XML-specific tree-tree distance metric [2] is used as the inter-object distance metric for the X-TAH that incorporates XML structure characteristics as well as domain specific semantic information provided by knowledge base. With this inter-object distance metric, as well as the inter-cluster distance metric used (e.g. pairwise distance of objects in a group) by the clustering techniques [3], X-TAH can be automatically constructed from XML data sources. Guided by the X-TAH, based on relaxation construct specifications in RLXQuery, the relaxation kernel process the query and return approximately matched answers.

Publications and Products

1. Dongwon Lee, Murali Mani, Wesley W. Chu "Effective Schema Conversions between XML and Relational Models" In European Conf. on Artificial Intelligence (ECAI), Knowledge Transformation Workshop (ECAI-OT), Lyon, France, July 2002 (Invited)

2. Dongwon Lee, Murali Mani, Frank Chiu, Wesley W. Chu "NeT & CoT: Translating Relational Schemas to XML Schemas using Semantic Constraints" In 11th ACM Int'l Conf. on Information and Knowledge Management (CIKM), McLean, VA, USA, November 2002

Project Impact

As the number of data sources available on the web increases, it becomes common to share information among heterogeneous data sources, where the structures of the participating data sources may be different, although they are using the same ontology about the same contents. Our XML query relaxation technique enables users to query against differently structured data sources. The query can be relaxed structurally and is able to retrieve relevant approximate information from data sources with different structures.

Our XML query relaxation methodology is applicable to the medical domain as well. For ease in information exchange and presentation, an increasing number of medical documents (e.g. radiological documents or medical lab reports) are stored in XML format. Our query relaxation methodology will assist medical staff and patients in finding desired information over heterogeneous information sources. We are currently working with radiologists at UCLA and using their patient reports as a dataset for evaluating the effectiveness of our methodology.

Goals, Objectives and Targeted Activities

In this project, we focused on query relaxation for the XML model. Unlike the relational model where schema is simple and fixed, the schema in the XML data model is relatively complex and easily extensible. Thus, it is unrealistic to ask the user to understand the full schema and compose very complex queries. As such, the goal of our project is to develop a methodology to automatically relax the user’s query according to the user’s relaxation specifications when the original query yields null or insufficient answers.

To support XML query relaxation, we have developed a technique for automatically generating a relaxation index, X-TAH, which includes the following three steps: 1) for a given query structure pattern, derive XML fragments from XML documents that are both structurally and semantically similar to the query structure pattern; 2) cluster these XML fragments into groups; and 3) assign object name to each of the internal cluster node to describe the content of the sub-cluster represented by this node to facilitate the query relaxation process.

Our proposed RLXQuery language allows users to include query relaxation specifications. And we plan to implement the RLXQuery relaxation kernel, which is built on top of an XML query engine to support RLXQuery.

Area Background

Database Systems, Knowledge-based system, XML query answering and information retrieval

Area References

1. http://www.w3c.org/TR/xquery

2. Andrew Nierman and H. V. Jagadish. Evaluating Structural Similarity in XML Documents. Proceedings of the Fifth International Workshop on the Web and Databases, June 6-7, 2002, Madison, Wisconsin, USA.

3. Wesley W. Chu, Kuorong Chiang, Chih-Cheng Hsu, Henrick Yau. An Error-based Conceptual Clustering Method for Providing Approximate Query Answers Communications of ACM. 1996

Potential Related Projects

1) Similarity queries over the XML data Project in University of Bologna, Italy.

Project Websites
http://www.cobase.cs. ucla.edu

Description of the website

The website provides a background of CoBase (A Cooperative Database System for query relaxation in relational model), an introduction to the XML Query Relaxation Project, the motivation for XML query relaxation and the methodology we used for XML query relaxation.