One of the reasons behind maintaining any database is to enable the user to find interesting patterns and trends in the data. For example, in a supermarket, the user can figure out which items are being sold most frequently. But this is not the only type of `trend’ which one can possibly think of. The goal of database mining is to automate this process of finding interesting patterns and trends. Once this information is available, we can perhaps get rid of the original database. The output of the data-mining process should be a “summary” of the database. This goal is difficult to achieve due to the vagueness associated with the term `interesting’. The solution is to define various types of trends and to look for only those trends in the database. One such type constitutes the association rule.
In the rest of the discussion, we shall assume the supermarket example, where each record or tuple consists of the items of a single purchase. However the concepts are applicable in a large number of situations.
In the present context, an association rule tells us about the association between two or more items. For example: In 80% of the cases when people buy bread, they also buy milk. This tells us of the association between bread and milk. We represent it as –
bread => milk | 80%
This should be read as – “Bread means or implies milk, 80% of the time.” Here 80% is the “confidence factor” of the rule.