|
Data mining derives its name from the similarities between searching for valuable business information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material or intelligently, probing it to find exactly where the value reside.
Tasks solved by Data Mining
Predicting - A task of learning a pattern from examples and using the developed model to predict future values of the target variable.
Classification - A task of finding a function that maps an example into one of several discrete classes.
Detection of relations - A task of searching for the most influential independent variables for a selected target variable.
Explicit Modeling - A task of finding explicit formulate describing dependencies between various variables. "
Clustering - A task of identifying a finite set of categories or clusters that describe data.
Deviation Detection - A task of determining the most significant changes in some key measures of data from previous or expected values.
Source: Megaputer Intelligence Ltd.
Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:
.
Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the' data - quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.
Automated discovery of,previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated pr.oducts that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.
.
Databases can be larger in both depth and breadth: The databases can have more columns and rows. Usually, analysts must often limit the number of variables they examine when doing hands-on analysis due to time constraints. Yet, variables that are discarded because they seem unimportant
may carry information about unknown patterns. High performance data mining allows users to explore the full depth of a database, without preselecting a subset of variables |