Cluster Analysis Algorithm Paper
Cluster Analysis Algorithm Paper
Cluster analysis, also known as cluster analysis, is a statistical analysis method to study the problem of classification (of samples or metrics), and also an important algorithm for data mining. Here is the paper I shared with you on Cluster Analysis Algorithm, welcome to read it.
A cluster analysis algorithm is given n vectors in an m-dimensional space R, and attributes each vector to one of k clusters such that the distance between each vector and its cluster center is minimized. Clustering can be understood as maximizing the correlation within classes and minimizing the correlation between classes. The clustering problem, as an unguided learning problem, aims to obtain some kind of intrinsic data pattern by dividing the original set of objects into similar groups or clusters. The basic idea of cluster analysis is: using the statistical value of multivariate, quantitatively determine the affinity relationship between each other, consider the link between the object and the dominant role of multiple factors, according to the degree of difference between their affinity, categorized into different classifications in a dollar, so that the classification of a more objective reality and reflect the inevitable connection between the things inherent. In other words, cluster analysis is to regard the research object as many points in the multi-dimensional space, and reasonably divided into a number of classes, so it is a kind of according to the similarity between the variable domains and gradually categorized into groups of methods, which can objectively reflect the intrinsic combination of the relationship between these variables or regions. The system of salt mining area is a large multi-level and complex system, involving many fuzzy and uncertain factors. The economic classification of Pingdingshan Salt Mining Area is to take all the salt mining areas in the whole city of Pingdingshan as the object of study, to take each salt mining area as the basic unit, to take the economy as the center, and to carry out the economic type zoning with the goal of the development strategy and reasonable layout. The basic principles are: the relative consistency of resource development and utilization of salt mining areas in Pingdingshan City; the consistency of natural, economic and social conditions; and the relative stability of certain administrative and geographical units. The current administrative division of Pingdingshan City salt mining area can not reflect the common points of the salt mining area, it is necessary to fuzzy cluster analysis of the actual economic situation of those similar iron ore mining area categorization, analysis, discovery of the differences between the various conditions of the mining area, the right remedy for the formulation of the development of countermeasures to provide a basis.
Second, the establishment of the indicator system
1, to determine the classification of indicators for the division of economic zones, should be considered indicators of a variety of factors. Both the rock salt mine resource reserves should be the main, but also to give due consideration to the quality of rock salt and the exploration stage and development and utilization of the situation; both direct and indirect indicators; both the current situation of the development of the mining area, but also the process of the development of the mining area and the future direction of mining area development. Referring to the relevant information and combining with the opinions of experts, we have determined the indicators for the economic zone division of the salt mining area in Pingdingshan City. As shown in Table 1. The table lists the specific indicators and the raw data of each indicator (the data comes from the 2006 Mineral Reserves Summary Table of Henan Province). Table 1 Salt mining area economic division indicator system and indicator data Note: N in the table indicates missing data, the exploration stage 1, 2, 3, respectively: preliminary exploration, detailed census, detailed exploration, the utilization status 1 to 7, respectively: the near future is not suitable for further work, can be used for further work, difficult to utilize in the near future, recommended near-term utilization, plans for near-term utilization, infrastructure mining area, mining area.
2, the conversion indicator data due to the existence of different variables between the different levels due to the existence of different variables between the different levels, different orders of magnitude, in order to make the variables more comparable, it is necessary to convert the data. Currently there are three methods of data processing, namely, standardization, extreme difference standardization and regularization. In order to facilitate a more intuitive comparison of the size of the value of the same indicator between the cities, we used the regularization conversion method. Its calculation formula is as follows:To facilitate the narrative, the following settings are made: set Xi(i=1,2,3,…,21) as the value of the ith evaluation indicator in the specific indicator layer, Pi(i=1,2,3,…,21) as the value of the ith indicator after regularization, 0≤Pi≤1, Xs,i(Xs,i=Xmax- Xmin), is the standard value of the ith evaluation index, Xmax is the maximum value, Xmin is the minimum value. (1) For the higher the better `indicator ① Xi ≥ Xmax, then Pi = 1; ② Xi ≤ Xmin, then Pi = 0; ③ Xmin< Xi & lt; Xmax, then its calculation formula: Pi = Xi-Xmin/Xs,i (2) For the lower the better indicator ① 2Xi ≤ Xmin, then Pi = 1; ② Xi ≥ Xmax, then Pi = 0; ③ Xmin<Xi<Xmax, then its calculation formula is: Pi=Xmax-Xi/Xs,i All the data of the indicators involved in the cluster analysis are shown in Table 2.
III. Cluster Analysis
1. Clustering Steps (Stage). From 1 to 3 indicates the sequence of clustering.
2, case merging (ClusterCombined). Indicates that the cases merged in a step, such as the first step in the case 1 Ye County Tianzhuang salt section and case 2 Ye County Mazhuang salt section merged, after the merger with the first item of the case number indicates the generation of a new class.
3, similarity coefficients (Coefficients). According to the basic principle of cluster analysis, the cases with the highest degree of closeness, i.e., the coefficient of similarity closest to 1, are the first to be merged. Therefore, the coefficients in the column correspond to the first column of the clustering steps, the coefficient value from small to large.
4, the first appearance of the new class step (StageClusterFirstAppears). Corresponding to the various clustering steps involved in the merger of the two, if there is a newly generated class (i.e., by two or more cases merged into a class), then the corresponding column shows the new class in which the first step in which the new class is generated. For example, if the first column of the column in the third step shows a value of 1, it means that the first of the two items to be merged is a new class generated for the first time in the first step. If the value is O, it means that the corresponding item is still a case (not a new class).
5, the next occurrence of the new class step (NextStage). Indicates that the new class generated by the corresponding step will be merged with other cases or new classes in the first few steps. If the value of the first line is 11, it means that the new class generated by the first clustering step will be merged with other cases or new classes in the 11th step.
6. Resolved diagram DendrogramusingAverageLinkage(BetweenGroups)RescaledDistanceClusterCombine clustering dendrogram (method: average linkage between groups) The diagram clearly shows the whole process of clustering. He scaled the actual distances to between 0 and 25, and connected cases or new classes of similar nature by connecting them step by step until there was no one class. In the upper part of the figure on the distance scale according to the need (coarse or subdivided) selected a division of the class distance value, and then vertical scale line, the vertical line will intersect with the horizontal line, then the number of intersecting intersections that is the number of classified categories, intersecting the horizontal line of the cases corresponding to the clustering of a class. For example, if the value of scale is 5, then the cases will be clustered into 3 categories: Tianzhuang salt section in Yexian County, Mazhuang salt mine section in Yexian County as one category, Louzhuang salt mine in Yexian County, Wulibao salt mine section in Yexian County as one category, and Yaozhai salt mine in Yexian County as one category. If you choose the scale value of 10, it is aggregated into 2 categories: Ye County Tianzhuang salt section, Ye County Mazhuang salt mine section for a class, Ye County Louzhuang salt mine, Ye County Wulibao salt mine section, Ye County Yaozhai salt mine for a class.
Fourth, the conclusion
The five salt mining areas in Pingdingshan City, the economic zone division, the division into several districts is appropriate, neither the more the better, nor the less the better. The purpose of the division of the economic zone is to be based on the characteristics of the resources of the salt mine economic zone, survey, development of different, categorized to guide the economic activities, so that people’s economic activities more in line with local realities, so that the economic zones can give full play to their respective advantages, to do to promote the strengths and avoid the weaknesses, to avoid harm, to achieve the purpose of investing in fewer outputs, to create a good economic and social benefits. If there are too many zones, the significance of zoning will be lost, and if there are too few zones, it will be difficult to achieve targeted classification guidance. Combining the above clustering analysis results, we can come up with three programs. Two of them are more appropriate and can be chosen. Scheme 1: (When the scale is 5, it is divided into 3 categories) Yexian Tianzhuang salt section and Yexian Mazhuang salt mine section are one category, Yexian Louzhuang salt mine and Yexian Wulibao salt mine section are one category, and Yexian Yaozhai salt mine is one category. From the clustering analysis of the Pingdingshan City salt mine classification map scheme one. Scheme II: (When the scale is 10, it is divided into 2 categories) Yexian Tianzhuang salt section, Yexian Mazhuang salt mine section is one category, Yexian Louzhuang salt mine, Yexian Wulibao salt mine section, Yexian Yaozhai salt mine is one category. From the cluster analysis, Pingdingshan City salt mining area classification map scheme II. Pingdingshan City salt mining area classification map scheme 2 The principle of cluster analysis is to aggregate the ore quality, resource reserves, exploration stage, utilization of similar or similar mining areas, and the results of its analysis are also intuitively easy to see. Combined with the actual administrative division of Pingdingshan City and the characteristics of mining enterprises, we have made an adjustment to the division of iron ore zones so that the theory and practice can be more closely combined to better guide practice.
1, Yexian Tianzhuang salt section, Yexian Mazhuang salt section for a class, this class belongs to the size of the deposit is comparable, close to the resource reserves, close to the stage of exploration and development, the degree of utilization is comparable, therefore, can be divided into a class.
2, Ye County Louzhuang salt mine, Ye County Wulibao salt mine section for a class, this class belongs to the exploration and development stage at the same stage.
3, Ye County Yao Zhai salt mine for a class, this class belongs to the higher reserves, salt grade is higher, so its exploration and mining planning is different from the other two categories. In general, the use of cluster analysis is basically successful, most of the classification is in line with reality. Comprehensive discussion of the above salt mining area is divided into the following table: of course, cluster analysis has its advantages and disadvantages: (1) advantages: the advantages of cluster analysis model is intuitive, the conclusion of the form of concise. (2) Disadvantages: when the sample size is large, it is difficult to obtain the clustering conclusion. As the similarity coefficient is based on the reflection of the subjects to establish a reflection of the subjects to ask the internal connection of the indicators, and in practice sometimes despite the data derived from the reflection of the subjects found that they have a close relationship, but there is no intrinsic connection between things, at this point, if the results of cluster analysis based on the distance or similarity coefficients, it is clearly inappropriate, but the cluster analysis model itself can not recognize this type of error.
What are the advantages of fuzzy cluster analysis method vs. cluster analysis method?
What are the advantages of fuzzy cluster analysis method vs. cluster analysis method?
Fuzzy clustering (FCM) is one of the cluster analysis methods, which is an improvement of fuzzy mathematics by incorporating it into K-means. General division algorithms, such as K-means, divide data into disjoint classes. That is, each data will eventually belong to one and only one clustering class by calculation. However, there are a large number of clustering problems in the objective world where the boundaries are not distinct. Fuzzy clustering extends the idea of traditional clustering.FCM considers an object close to the boundary of two classes, which is slightly closer to one of them.If a weight is assigned to each object and to each class, indicating the degree to which the object belongs to the cluster (known as the degree of affiliation), by using the affiliation, it makes it possible to assign each piece of data to all the clusters.Unlike traditional clustering methods, the fuzzy clustering The result makes it possible for each data to ultimately belong to multiple clusters, with each data assigned an affiliation to each cluster. The result of clustering can be expressed as a fuzzy matrix. In effect, it is an improved method for increasing the classification effectiveness of clustering.
Additionally, the advantage of cluster analysis is to do intelligent division of data by establishing the perspective, eliminating the pain of manual division. At the same time, an object consists of a number of different properties of attributes, classification through clustering, for people to make decisions to provide reference.