Pattern-Based Clustering for Database Attribute Values

Matthew Merzbacher, Wesley W. Chu
Computer Science Department
University of California at Los Angeles


We present a method for automatically clustering similar attribute values in a database system spanning mulitple domains. The method constructs an attribute abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a confidence and popularity that combine to express the ``usefullness'' of the rule.

Attribute values are clustered if they are used as the premise for rules with the same consequence. By iteratively applying the algorithm, a hierarchy of clusters can be found. The algorithm can be improved by allowing domain expert supervision during the clustering process. An example as well as experimental results from a large transportation database are included.

Tue Jun 21 15:40:43 PDT 1994