Home »
Data Mining
KDD Process in Data Mining
Data Mining | KDD Process: In this tutorial, we will learn about the KDD (Knowledge Discovery in Database) Process in Data Mining.
By IncludeHelp Last updated : April 17, 2023
Data Mining Knowledge
Extracting data from a large database is data mining. Data Mining is defined as the extraction of data from enormous data sets. In other terms, it can be said that data mining is the process of mining knowledge. To recognize meaningful patterns, the data mining process relies on data compiled in the data warehousing stage.
For instance - "Gold Mining from rock or sand" is the same as "Data Mining Knowledge"
Data mining may also refer to data analysis activity. It is the computer-supported process of analyzing huge data sets that have either been compiled or downloaded into the computer by large data sources. The computer analyzes the data and extracts key information from it in the data mining process. It looks for hidden patterns and attempts to predict future behaviour within the data set.
Why it is important?
There are various things which show that Data mining is important. The most common applications for the use of data mining areas -
- Market Analysis
- Detection of fraud
- Customer retention
- Control of Production
- Scientific exploration
In contrast to data analytics, where discovery goals are often not known or well defined at the outset, data mining efforts are usually driven by a specific lack of information that cannot be satisfied through standard data queries or reports. Data mining produces data from which it is possible to derive and then test predictive models, leading to a greater understanding of the marketplace.
Data mining's business application is broad. It can be used for everything from pharmaceutical research to traffic pattern modelling. However, the classic use case is to predict customer behaviour to optimize sales and marketing activities. For example, retailers often use data mining to predict what their customers might be buying next.
Other terms of reference for data mining:
- Mining of Knowledge
- Extraction of Knowledge
- Analysis of the pattern
- Archaeology of data
- Dredging of data
Effective data collection and warehousing as well as computer processing involve data mining. Data mining uses sophisticated mathematical algorithms for segmenting, the data and evaluating the probability of future events, also known as Knowledge Discovery in Data Mining, data mining (KDD).
KDD and its Process
The term Knowledge Discovery in Databases, or KDD, in short, refers to the broad process of discovering knowledge in data and emphasizes the "high-level" application of specific data mining methods. Researchers in machine learning, pattern recognition, databases, statistics, artificial intelligence, expert systems knowledge acquisition, and data visualization are of interest.
In the context of big databases, the unifying objective of the KDD process is to extract knowledge from data.
This is done by using data mining methods (algorithms) to extract (identity) what is considered knowledge according to the specifications of the measurements and thresholds, using the database along with any pre-processing, sub-sampling and transformation requirements of the database. The below-mentioned diagram is showing data mining and its process.
Figure: KDD process
Data Selection - Data relevant to the retrieved analysis
Data cleaning and pre-processing - Eliminate noisy and inconsistent information
Data integration - Multiple data sources combined
Data Transformation - Transform into a form suitable for data mining
Data Mining - Extract data patterns using smart methods
Evaluation of Pattern - Identify interesting patterns
Knowledge representation - Representation of Knowledge, Presenting to the user of mined knowledge
Example of Data mining
Well-known users of data mining techniques are grocery stores. Many supermarkets offer customers free loyalty cards that give them access to reduced prices that are not available to non-members. Cards make it easy for stores to track who buys what, when they buy it, and at what price. After analysis of the data, stores can then use this data to offer customers coupons tailored to their purchase habits and decide when to put items on sale or when to sell them at full price.