Big data development common 9 kinds of data analysis?
Data analysis is the process of extracting valuable information from data, the process requires a variety of data processing and categorization, only to master the correct data classification methods and data processing mode, in order to play a multiplier effect, the following is the data analysts introduced by the Beijing North Blue Bird essential 9 kinds of data analysis mindset:
1. Classification is a basic way of data analysis, data according to its characteristics, the data object can be divided into different parts and types, and then further analysis, can further dig the essence of things.
2. Regression
Regression is a widely used method of statistical analysis, which can be used to determine the causal relationship between variables by specifying the dependent and independent variables, establish a regression model, and solve the parameters of the model based on the measured data, and then evaluate whether the regression model is able to fit the measured data well, and if it is able to fit the data well, it can be used to further predict based on the independent variables. If it can fit well, further prediction can be made according to the independent variables.
3. Clustering
Clustering is a classification method that divides data into aggregate classes based on their intrinsic properties, with elements in each aggregate class having the same characteristics as much as possible, and the characteristics of different aggregate classes differing as much as possible from each other, unlike categorical analysis, in which the classes are unknown, and therefore cluster analysis is also known as unguided or unsupervised learning.
Data clustering is a technique for analyzing static data and is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, and bioinformatics.
4. Similarity Matching
Similarity matching is a method of calculating the degree of similarity between two pieces of data, which is usually measured as a percentage. Similar matching algorithms are used in many different computing scenarios in areas such as data cleaning, user input error correction, recommendation statistics, plagiarism detection systems, automated grading systems, web search, and DNA sequence matching.
5. Frequent itemset
Frequent itemset is the collection of items that occur frequently in instances, such as beer and diapers. Apriori algorithm is a frequent itemset algorithm that mines association rules, and its core idea is to mine the frequent itemset through two phases of candidate set generation and downward closure detection of episodes, and it is now widely used in the fields of business, network security, and so on.
6. Statistical description
Statistical description is based on the characteristics of the data, with certain statistical indicators and indicator system, indicating the information fed back by the data, is the basic processing of data analysis, the main methods include: the calculation of the average and variance indicators, graphical representations of the distribution pattern of the information.
7. Link prediction
Link prediction is a method of predicting the relationship that should exist between the data, link prediction can be divided into prediction based on node attributes and prediction based on the structure of the network, link prediction based on the attributes of the nodes include analysis of node qualification attributes and attributes of nodes between the relationship between the information, the use of node information, knowledge sets and similarity between nodes and other methods to get the hidden relationship between the nodes. to get the hidden relationships between nodes. Compared with link prediction based on node attributes, network structure data is easier to obtain. A major viewpoint in the field of complex networks suggests that the traits of individuals in a network are not as important as the relationships between individuals. Therefore link prediction based on network structure has received increasing attention.
8. Data Compression
Data compression is a technical method to reduce the amount of data to reduce the storage space and improve its transmission, storage and processing efficiency without losing the useful information, or to reorganize the data according to a certain algorithm to reduce the redundancy of the data and the space for storage. Data compression is divided into lossy compression and lossless compression.
9. Causal analysis
Causal analysis is the use of the cause and effect of the development and change of things to predict the method, the use of causal analysis of market forecasting, mainly using regression analysis, in addition to the calculation of the economic model and the output of the investment in the analysis of the more commonly used methods.
What are the methods of data analysis?
What are the common methods of data analysis?
1. Trend Analysis
When there is a large amount of data, we would like to find the data information from the data faster and easier, then we need to use the graphing function. The so-called graphing function is to use EXCEl or other drawing tools to draw graphs.
Trend analysis is often used to track core metrics such as click-through rates, GMV, and active users over time. Often, only a simple data trend graph is produced, but not analyzed. It must look like the above. The data has those trend changes, be it cyclical, whether there are inflection points as well as analyzing the reasons behind it, internal or external. The best outputs for trend analysis are ratios, with chain, year-over-year and fixed base ratios. For example, how much GDP increased in April 2017 compared to March, which is the chain ratio, which reflects the recent change in trend but has seasonal implications. To eliminate the effect of seasonality, year-on-year data is introduced, e.g., how much GDP increased in April 2017 compared to April 2016, which is year-on-year. To better understand the Fixed Base Ratio, which fixes some reference point, e.g., using January 2017 data as the reference point, the Fixed Base Ratio is a comparison between May 2017 data and January 2017 for that data.
2. Comparative Analysis
Horizontal Comparison Ratio: Horizontal Comparison Ratio is a comparison with itself. The most common data metrics are the need to compare to target values to see if we have met our goals; and to see how we have grown from month to month compared to the previous month.
Longitudinal Comparison: simply put, it is comparison with others. We must compare ourselves with our competitors to understand our share and position in the market.
Many of you may say comparative analysis sounds simple. Let me give you an example. There is a login page for an e-commerce company. Yesterday’s PV was 5000. how would you feel about this kind of data? You won’t feel anything. If the average PV of this sign-in page is 10,000, it means there was a major problem yesterday. If the average PV of the check-in page is 2,000, there was a jump yesterday. Data can only be meaningful by comparison.
3. Quadrant Analysis
Based on different data, each comparison is divided into 4 quadrants. If IQ and EQ are divided, they can be divided into two dimensions and four quadrants, each with its own quadrant. Generally speaking, IQ ensures one’s lower limit and EQ raises one’s upper limit.
Say an example of the quadrant analysis method, used in practice: usually, the registered users of p2p products are dominated by third-party channels. If you can divide the four quadrants according to the quality and quantity of traffic sources, and then choose a fixed point in time to compare the effect of the cost of traffic for each channel, the quality can be used as a criterion for the dimension of the total amount retained. For high quality and quantity of channels, continue to add the introduction of high quality and low quantity of channels, low quality and low quantity of passes, low quality and high quantity of attempted strategies and requirements, for example quadrant analysis allows us to compare and analyze the time to obtain very intuitive and fast results.
4. Cross-analysis
Comparative analysis includes both horizontal and vertical comparisons. If you want to compare horizontally and vertically at the same time, you can use the cross-tabulation method. The cross-analysis method is to cross-display data from multiple dimensions and perform a combined analysis from multiple perspectives.
When analyzing app data, it is usually divided into iOS and Android.
The main function of cross-analysis is to break down the data from multiple dimensions and find the most relevant dimensions to explore why the data changed.