Technical Process of Data Mining Technology
Considering from the data itself, usually data mining requires eight steps such as data cleaning, data transformation, data mining implementation process, pattern evaluation and knowledge representation.
(1) Information collection: abstract the feature information needed in data analysis according to the identified data analysis object, then choose a suitable information collection method and deposit the collected information into the database. For massive data, it is crucial to choose a suitable data warehouse for data storage and management.
(2) Data Integration: Organic centralization of data from different sources, formats, and nature of characteristics logically or physically, thus providing comprehensive data sharing for the enterprise.
(3) Data Statute: It takes a long time to execute most data mining algorithms even on a small amount of data, and the amount of data is often very large when doing business operation data mining. Data statute technology can be used to get a statute representation of the data set, which is much smaller, but still close to maintaining the integrity of the original data, and statute after the execution of the data mining results with the statute before the execution of the same or almost the same results.
(4) Data Cleaning: Some of the data in the database are incomplete (some attributes of interest are missing attribute values), noisy (contain incorrect attribute values), and inconsistent (different representations of the same information), so it is necessary to carry out data cleaning, and store the complete, correct, and consistent data information in the data warehouse. Otherwise, the results of mining will be poor.
(5) Data transformation: by smoothing aggregation, data generalization, normalization and other ways to transform the data into a form suitable for data mining. For some real data, transforming the data through conceptual hierarchies and data discretization is also an important step.
(6) Data mining process: according to the data information in the data warehouse, choose the appropriate analytical tools, apply statistical methods, example inference, decision tree, rule inference, fuzzy sets, or even neural networks, genetic algorithms methods to deal with the information, and come up with useful analytical information.
(7) Pattern Evaluation: From a business perspective, industry experts verify the correctness of data mining results.
(8) Knowledge representation: the analytical information obtained from data mining is presented to the user in a visual way, or stored in a knowledge base as new knowledge for other applications.
The data mining process is an iterative cycle, where each step needs to go back to the previous steps, readjust and execute if it does not achieve the desired goal. Not every data mining job requires every step listed here, for example, step (2) of data integration can be omitted when multiple data sources do not exist in a particular job.
Step (3) Data Statute (4) Data Cleaning (5) Data Transformation is also collectively known as Data Preprocessing. In data mining, at least 60% of the cost may be spent on step (1) information collection phase, while at least 60% of the effort and time is spent on data preprocessing
What are the complete steps of data mining?
1, understand the data and the source of the data (understanding).
2. Acquire relevant knowledge and techniques (acquisition).
3, Integration and checking of data (integrationandchecking).
4, remove errors or inconsistent data (datacleaning).
5, modeling and hypothesis development (modellandhypothesisdevelopment).
6, the actual data mining work (datamining).
7, testing and verification of mining results (testingandverification).
8, Interpretation and application (interpretationanduse).
Data Mining Basic Steps
If data mining is broadly understood as the process of obtaining useful information from data, then data mining can be divided into: “data collection – data pre-processing – forming target data – selecting mining methods – data mining processing – mining results evaluation – obtaining the results,” such as Several stages. If satisfactory results are not obtained, you can go back to any of the previous steps and redo them according to the situation.
Data mining standard operating procedures, which are mainly divided into which steps
The standard process of data mining modeling, also known as cross-industry data mining standard operating procedures, data mining is mainly divided into business definition, data understanding, data preprocessing, modeling, implementation of the six steps, each step of the narrative description is as follows:
1. Definition of the business problem, the central value of data mining The central value of data mining lies mainly in business issues, so the initial stage must be an in-depth understanding of the organization’s problems and needs, and after continuous discussion with the organization and confirmation, to formulate a detailed and achievable program.
2. Data comprehension, define the required data, collect complete data, and do preliminary analysis on the collected data, including identifying data quality issues, making basic observations on the data, removing noise or incomplete data, which can improve the efficiency of data preprocessing, and then setting up assumptions.
3. Data preprocessing, because data sources are different, there are often inconsistencies in formatting and other issues. Therefore, it is necessary to check and correct the data several times before building the model to ensure that the data is complete and purified.
4. Model building, according to the form of data, select the most suitable data mining techniques and use different data to test the model to optimize the prediction model, the more accurate the model, the higher the validity and reliability, the more advantageous for decision makers to make the right decision.
5. Evaluate and understand that the results obtained in the test are meaningful only for that data. In practice, accuracy varies with different datasets, so the most important purpose of this step is to understand if there are any blind spots in the business problem that have not yet been taken into account.
6. Implementation, the data mining process through a virtuous cycle, and finally the integrated model will be applied to business, but the completion of the model does not mean the completion of the entire project, the acquisition of knowledge can also be predicted through the organization, automation, and other mechanisms for the application of the stage includes deployment planning, supervision, maintenance, inheritance and the final report on the results of the entire cycle of work.
What are the basic steps of data mining
The operating environment of this article: windows 10 system, thinkpadt480 computer.
The specific steps are as follows:
1, define the problem
The first and most important requirement before starting knowledge discovery is to understand the data and business problems. It is important to have a clear and unambiguous definition of the goal, i.e., to decide what exactly you want to do. For example, if you want to improve e-mail utilization, you might want to do “increase user utilization” or “increase the value of a user session,” and the modeling to solve these two problems is almost entirely different, and decisions must be made.
2. Establishing a data mining library
Establishing a data mining library consists of the following steps: data collection, data description, selection, data quality assessment and data cleaning, merging and integrating, constructing metadata, loading the data mining library, and maintaining the data mining library.
3. Analyzing Data
The purpose of analysis is to find the data fields that will have the greatest impact on the predicted output, and to determine whether exported fields need to be defined. If the dataset contains hundreds of fields, then browsing and analyzing the data will be a very time-consuming and tiring thing, then you need to choose a good interface and powerful tools to help you complete these things.
4, prepare the data
This is the last step of data preparation before the establishment of the model. You can divide this step into four parts: selecting variables, selecting records, creating new variables, and converting variables.
Modeling is an iterative process. Different models need to be carefully examined to determine which one is most useful for the business problem being faced. A portion of the data is used to build the model, and then the rest of the data is used to test and validate this resulting model. Sometimes there is a third dataset, called the validation set, because the test set may be affected by the characteristics of the model, when a separate dataset is needed to verify the accuracy of the model. Training and testing data mining models requires splitting the data into at least two parts, one for model training and another for model testing.
6. Evaluating the model
After the model is built, the results obtained must be evaluated, explaining the value of the model. The accuracy obtained from the test set is meaningful only for the data used to build the model. In practice, further information is needed about the types of errors and the amount of associated costs that result. Experience has shown that a valid model is not necessarily a correct model. A direct cause of this is the various assumptions implicit in model building, so it is important to test models directly in the real world. Apply it on a small scale first, obtain test data, and feel satisfied before generalizing to a larger scale.
After the model has been built and validated, there are two main ways it can be used. The first is to provide it to analysts for reference; the other is to apply this model to different data sets.
Free Learning Video Sharing: Introduction to Programming
What is data mining, or what is the process of data mining
1.1 The Rise of Data Mining
1.1.1 Data Abundance and Knowledge Scarcity
Reprocessing the information, i.e., deeper inductive analysis, to understand the patterns from the information, in order to obtain more useful information, i.e., knowledge. On the basis of the accumulation of a large amount of knowledge, the principles and laws are summarized, and the so-called wisdom is formed.
The current awkward situation: “rich data” and “poor knowledge”
1.1.2 From Data to Knowledge
The formation of the data warehouse: with the growth of the volume of data, the data source brought about by the incompatibility of various data formats, in order to facilitate access to the information needed to make decisions, it is necessary to integrate the entire organization’s data in a unified form stored together, which formed the data warehouse (datawarehouse, DW)
OLAP (OnLineAnalyticalProcessing) online analytical tools: in response to accelerated changes in market OLAP (OnLine Analytical Processing): In response to the accelerated pace of change in the marketplace, OLAP has been proposed as a reproducible analytical tool capable of conducting real-time analysis and generating corresponding reports. OLAP allows users to interactively browse the contents of the data warehouse and perform multidimensional analysis of the data therein.
The OLAP analysis process is based on the premise that the user has a preconception and assumption of some kind of knowledge hidden in the data, and is a user-directed process of information analysis and knowledge discovery.
Intelligent automatic analysis tools: In order to adapt to the rapidly changing market environment, there is a need for intelligent automatic tools based on computers and information technology to help mine all kinds of knowledge hidden in the data. Such tools can generate a variety of their own hypotheses ➡️ and then use the data in the data warehouse (or large databases) to test or verify ➡️ and then return to the user the most valuable test results.
In addition, such tools should be able to adapt to the multiple characteristics of real-world data (large volume, noisy, incomplete, dynamic, sparse, heterogeneous, nonlinear, etc.)
1.1.3 The emergence of data mining (DM)
In 1995, in the United States, at the annual meeting of the computer, the concept of data mining (DataMining) was proposed.
The whole knowledge discovery process is composed of a number of important steps (data mining is only one of the important steps):
1) Data Cleaning: removing data noise and data that are obviously irrelevant to the mining topic
2) Data Integration: combining relevant data from multiple data sources into a single data source
2) Data Integration: combining relevant data from multiple data sources into a single data source
3) Data Mining. (6) Knowledge Representation: Its role is to use visualization and knowledge expression techniques to show users the relevant knowledge mined
1.1.4 Business Problems Solved by Data Mining (Cases)
Customer Behavior Analysis
Customer Loss Analysis
Market and Trend Analysis