# Data analysis methods introduce a thing

### 5 Common Data Analysis Methods

The so-called formula method is for a certain indicator, using the formula layer by layer to break down the factors affecting the indicator.

Example: analyzing the reasons for lower sales of a product, broken down by the formula method:

The comparison method is the most common method of comparing two or more sets of data.

We know that isolated data is meaningless, and that comparison makes a difference.

Some variables that directly describe things, such as length, quantity, height, width, etc., are compared to get ratio data, growth rate, efficiency, effectiveness and other indicators, which is what is commonly used in data analysis.

For example: for year-on-year and year-on-year in the time dimension, growth rates, fixed-base ratios, comparisons with competitors, comparisons between categories, comparisons of characteristics and attributes, and so on.

Comparison method can find the change rule of data, use frequently, often used in conjunction with other methods.

The desired value is expressed by the use of coordinates through the division of 2 and more latitudes. A direct shift from value to strategy leads to some on-the-ground driving. The quadrant method is a strategy-driven thinking, often used in product analysis, market analysis, customer management, commodity management, etc..

28 law can also be called Pareto’s law, derived from the classic law of two or eight. For example, in personal wealth can be said that 20% of the world’s people hold 80% of the wealth. And in data analysis, it can be understood that 20% of the data produces 80% of the effect, you need to dig around this 20% of the data.

The funnel method is a funnel diagram, a bit like an inverted pyramid, is a process-oriented way of thinking, commonly used in the development of new users, shopping conversion rate of these changes and certain processes in the analysis.

### 16 Common Data Analysis Methods – Correlation Analysis

Correlation analysis examines whether there is some kind of dependence between phenomena, and explores the direction and degree of correlation for specific phenomena that are dependent.

Correlation analysis is a simple and easy way to measure the relationship between quantitative data. It can be analyzed to include the relationship between variables and how strong or weak the relationship is.

For example: the correlation between height and weight; the correlation between precipitation and river levels; the correlation between work stress and mental health.

Types of correlations

The correlations between objective things can be broadly categorized into two main types:

I.Functional relations

Functional relationship is the existence of a function that uniquely describes the values of two variables.

For example, the relationship between sales and sales volume can be represented by the function y=px (y denotes sales, p denotes unit price, and x denotes sales volume). Therefore, there is a functional relationship between sales volume and sales.

This type of relationship is not the focus of our attention.

II. Statistical Relationship

Statistical Relationships, which refer to a non-one-to-one correspondence between two things, whereby, when the variable x takes on a certain value, another variable y, although not uniquely determined, but changes within a certain range according to some law.

For example, the relationship between children’s height and their parents’ height, and between advertising costs and sales, cannot be uniquely determined by a functional relationship, but a certain relationship does exist between these variables. In most cases, the taller the parents are, the taller the children are; and the more money spent on advertising, the relatively more sales they make.

This relationship, then, is called a statistical relationship.

According to the correlation manifestation, it can be divided into different types of correlation, as detailed in the following figure:

Description of the correlation Ways

There are 3 common ways of describing whether two variables are correlated:

1. Correlation plots (typically scatter plots, league tables, etc.

2. Correlation coefficients

3. Statistical significance

Visualization to present the various correlations, commonly used in scatter plots, is as follows:

Steps in Correlation Analysis

Step1: Before correlation analysis, the first step is to get a general idea of the relationship between the variables through a scatter plot.

If there is no interrelationship between the variables, then they will appear as randomly distributed discrete points on the scatterplot, and if there is some kind of correlation, then most of the data points will be relatively dense and presented in some kind of trend.

Such a graph as the one above shows the relationship between the usual grades and the aptitude scores: when X increases, Y will increase significantly, indicating a positive correlation between X and Y.

Step2: Calculate the correlation coefficient

Scatter plots can show the relationship between the variables, but they are not precise. Correlation coefficients need to be obtained through correlation analysis to accurately reflect the degree of correlation in numerical terms.

There are three common types of correlation coefficients:

Pearson correlation coefficient,

Spearman rank correlation coefficient

Kendall correlation coefficient.

The most commonly used is the Pearson correlation coefficient; when the data does not satisfy normality, the Spearman correlation coefficient is used, and the Kendall correlation coefficient is used to determine the consistency of the data, such as a referee’s score.

Correlation analysis case

Basic information about the employees of a company, the dataset contains 3 columns, which are: gender, age, and salary,

Analysis theme: the desire to Understand the relationship between employee age and salary level (of interest to readers in corporate HR departments).

As shown in the figure, a scatter plot is used to first look at the relationship between the 2 variables.

Scatterplot shows that there seems to be some correlation between the 2 variables, in order to get a more accurate conclusion, the next step is to conduct a more accurate correlation analysis to verify the results of the analysis.

1. Menu operation: Analysis – Correlation – Bivariate <

2. Interpretation of results

Original hypothesis: there is no correlation between wages and age

Calculation results in sig=0.002, i.e., the original hypothesis does not hold. The significance of reality is that there is a highly significant correlation between age and wage level, which means that the wage decreases gradually with the increase in age.

### [Week 1] 5 customary analytic methods for data analysis

It’s like in middle school when you want to solve a quadratic way, you can use the formula method, the matching method, the direct opening and leveling method, and the factorization method.

There are also techniques in data analytics that can be used quickly in some generic analytics scenarios, and can be helpful in building data analytics models in the future.

The so-called formula method is for a certain indicator, using the formula layer by layer to break down the factors affecting the indicator

Example: to analyze the reasons for the low sales of a product, using the formula method to break down

Sales of a product = sales volume X the unit price of the product

Sales volume = sales volume of channel A + sales volume of channel B + sales volume of channel C B Sales + Channel C Sales + …

Channel Sales = Number of Clicked Users X Order Rate

Number of Clicked Users = Exposure X Click Rate

By breaking down sales layer by layer, we can refine the granularity of evaluation as well as analysis.

The formula dismantling method is a cascading parse of the problem, where the factors are broken down and peeled off layer by layer.

The Comparison Method is the most common method of comparing two or more sets of data.

We know that isolated data is meaningless, and that comparison makes a difference. Some variables that directly describe things, such as length, number, height, width, etc. Through comparison to get ratio data, growth rate, efficiency, effectiveness and other indicators, which is what is commonly used when analyzing data.

For example, it is used for year-on-year and chain-on-chain comparison, growth rate, fixed-base ratio in the time dimension, comparison with competitors, comparison between categories, comparison of characteristics and attributes, and so on. Comparison method can find the change rule of data, used frequently, often used with other methods.

Through the division of two and more dimensions, the use of coordinates to express the desired value. A direct shift from value to strategy leads to some on-the-ground facilitation.

The quadrant method is a strategy-driven thinking, often used in product analysis, market analysis, customer management, commodity management and so on.

For example, the chart below shows a four-quadrant distribution of clicks on an ad, with the X-axis from left to right indicating low to high, and the Y-axis from bottom to top indicating low to high.

There’s also the classic RFM model, which divides customers into eight quadrants based on the dimensions of most recent purchase (Recency), frequency of purchase (Frequency), and amount spent (Monetary).

1. Finding the common cause of the problem

Through the Quadrant Analysis Method, events with the same characteristics are attributed and analyzed to summarize the common causes. For example, in the case of the above advertisement, the events in the first quadrant can be extracted into effective promotion channels and promotion strategies, and the third and fourth quadrants can exclude some ineffective promotion channels;

2. Establishing grouping optimization strategies

Quadrant analysis for the placement of the quadrant analysis method can be used to establish optimization strategies for different quadrants, such as RFM customer management. For example, the RFM customer management model categorizes customers into different types according to the quadrant, such as key development customers, key retention customers, general development customers, and general retention customers.

The two-eighths rule, also known as Pareto’s law, is derived from the classic two-eighths rule. In data analysis, it can be understood as 20% of the data produces 80% of the effect needs to be mined around this 20% of the data

The two-eight method is to grasp the focus of the analysis, applicable to any industry. Find the focus, discover its characteristics, and then you can think about how to make the rest of the 80% to this 20% conversion to improve the effect.

Generally, it will be used in product categorization to measure and build the ABC model. For example, if a retailer has 500 SKUs and the sales corresponding to those SKUs, which SKUs are important is a matter of prioritizing in business operations.

The common practice is to use product SKUs as dimensions and corresponding sales as the base metrics, arrange these sales metrics in descending order, and calculate the cumulative total of product SKU sales as a percentage of total sales as of the current product SKU.

Percentages of 70% or less are classified as Category A.

Percentages within 70 to 90% (inclusive) are classified as Category B.

Percentages within 90~100% (inclusive) are classified as Category C.

The ABC analysis model can be used to divide not only products and sales, but also customers and customer transactions. For example, what are the customers that contribute 80% of the profits to the business and what is the percentage. Assuming 20%, then with limited resources, it is known to focus on maintaining these 20% categories of customers.

It is a process-oriented way of thinking, often used in things like new user development, shopping conversion rate, which have changes and certain processes in the analysis.

The core idea of the overall funnel model can actually be categorized as decomposition and quantification. For example, to analyze the conversion of e-commerce, what we have to do is to monitor the conversion of users on each level, looking for optimizable points on each level. For users who do not follow the process, specifically draw their conversion model, shorten the path to improve the user experience.

AARRR model: User Acquisition, User Activation, User Retention, User Revenue, and User Propagation

A single funnel analysis is useless and cannot yield any results, it has to be combined with others, such as comparing with historical data.

### What are the methods of data analysis?

The methods of data analysis include: comparative analysis, grouping analysis, predictive analysis, funnel analysis, AB test analysis, quadrant analysis, formula disassembly, feasible domain analysis, bi-eight analysis, and hypothetical analysis.

1. Comparative analysis: Comparative analysis refers to the comparison of indicators to reflect the quantitative changes of things, which belongs to the methods commonly used in statistical analysis. Common comparisons include horizontal and vertical comparisons.

Horizontal comparison refers to the comparison of different things at a fixed time, for example, the comparison of the price of goods purchased by different levels of users at the same time, and the comparison of the sales volume and profit margin of different goods at the same time.

Vertical comparisons refer to changes in the same thing in the time dimension, for example, ring, year-on-year and fixed-base comparisons, that is, the comparison of this month’s sales with the previous month’s sales, the comparison of the current year’s sales in January with the previous year’s sales in January, and the comparison of the current year’s monthly sales, respectively, with the average of the previous year’s sales, and so on.

The use of comparative analysis can be the size of the data, the level of high and low, fast and slow, etc. to make effective judgment and evaluation.

2. Grouping analysis: Grouping analysis refers to the nature of the data, characteristics, according to certain indicators, the data is divided into different parts, analyze its internal structure and interrelationships, so as to understand the development of things. According to the nature of the indicators, the grouping analysis method is divided into attribute indicator grouping and quantitative indicator grouping. The so-called attribute indicators represent the nature of things, characteristics, etc., such as name, gender, literacy, etc., these indicators can not be calculated; while the data indicators represent data that can be calculated, such as the age of people, wages and income, etc.. Grouping analysis is generally used in conjunction with comparative analysis.

3. Predictive analytics: Predictive analytics is mainly based on the current data, to judge and predict the trend of future data changes. Predictive analysis is generally divided into two kinds: one is based on the time series of prediction, for example, based on past sales performance, predicting the sales of the next three months; the other is a regression-type prediction, that is, based on the causal relationship between the indicators of the mutual influence of the prediction, for example, based on the user’s web browsing behavior, predicting the user’s possible purchase of goods.

4. Funnel analysis: Funnel analysis, also known as process analysis, its main purpose is to focus on the conversion rate of an event in an important link, the application of which is more common in the Internet industry. For example, for the process of credit card application, users from browsing card information, to fill out the credit card information, submit an application, the bank audit and approval of the card, and finally the user to activate and use the credit card, in the middle of a lot of important links, the amount of users in each link is getting less and less, thus forming a funnel. The use of funnel analysis enables the business side to focus on the conversion rate of each link, monitor and manage it, and when the conversion rate of a certain link is abnormal, it can be targeted to optimize the process and take appropriate measures to improve the business indicators.

5. AB test analysis: AB test analysis is actually a kind of comparative analysis, but it focuses on comparing two groups of samples with similar structure, and analyzes their differences based on the values of the sample indicators. For example, for the same function of an app, different style styles and page layouts are designed, and the two styles of pages are randomly assigned to users, and finally the advantages and disadvantages of different styles are evaluated based on the user’s browsing conversion rate of the page to understand the user’s preferences, so as to further optimize the product.

In addition to this, in order to do a good job of data analysis, readers also need to master certain mathematical fundamentals, such as the concepts of basic statistics (mean, variance, plurality, median, etc.), measures of dispersion and variability (extreme deviation, quartile, interquartile range, percentile, etc.), data distribution (geometric distribution, binomial distribution, etc.), as well as the basics of probability theory, statistics, sampling, confidence intervals, and statistical analysis. sampling, confidence intervals, and hypothesis testing, etc., to make data analysis results more professional through the application of relevant metrics and concepts.

6. Quadrant analysis: the X-axis from left to right is the high and low click-through rate, the Y-axis from bottom to top is the high and low conversion rate, the formation of four quadrants, which is what we are talking about quadrant analysis.

Find the corresponding data labeling points for the click-through rate and conversion rate of each marketing campaign, and then categorize the effect of this marketing campaign into each quadrant, and the 4 quadrants represent different effect evaluation.

7. Formula dismantling method: the so-called formula dismantling method is for a certain indicator, the formula performance of the indicator’s influence factors, such as daily sales of the influence factor is the sales of various commodities, to find the influencing factors, you need to dismantle the influencing factors of the influencing factors.

8. Feasible domain analysis: Feasible domain analysis is actually a kind of self-established data analysis model, according to the specific data constantly revised and adjusted the scope of feasible domain, the business indicators for effective evaluation.

9. The two-eight analysis: the law of eight and the long-tail theory is relative, the law of two-eight tells us that you have to pay attention to the head of the user, that is, the 20% of the user or commodity that can produce 80% of the revenue, while the long-tail theory tells us that we have to pay attention to the long-tail effect, that is, the remaining 20% of the revenue.

10. Hypothetical analysis: a simple understanding, the hypothetical method is in the known results of the data, in the impact of the results of a number of variables in the hypothesis of a quantitative, reverse derivation of the process of data analysis methods.

Data analysis method is a very widely used method among data statistics. There are a variety of specific methods, the exact use of which varies from person to person.

### What are some common data analysis methods?

What are some common data analysis methods?

1. Trend Analysis

When there is a large amount of data, we would like to find the data information from the data faster and easier, then we need to use the graphing function. The so-called graphing function is to use EXCEl or other drawing tools to draw graphs.

Trend analysis is often used to track core metrics such as click-through rates, GMV, and active users over time. Often, only a simple data trend graph is produced, but not analyzed. It must look like the above. The data has those trend changes, be it cyclical, whether there are inflection points as well as analyzing the reasons behind it, internal or external. The best outputs for trend analysis are ratios, with chain, year-over-year and fixed base ratios. For example, how much GDP increased in April 2017 compared to March, which is the chain ratio, which reflects the recent change in trend but has seasonal implications. To eliminate the effect of seasonality, year-on-year data is introduced, e.g., how much GDP increased in April 2017 compared to April 2016, which is year-on-year. To better understand the Fixed Base Ratio, which fixes some reference point, e.g., using January 2017 data as the reference point, the Fixed Base Ratio is a comparison between May 2017 data and January 2017 for that data.

2. Comparative Analysis

Horizontal Comparison Ratio: Horizontal Comparison Ratio is a comparison with itself. The most common data metrics are the need to compare to target values to see if we have met our goals; and to see how we have grown from month to month compared to the previous month.

Longitudinal Comparison: simply put, it is comparison with others. We must compare ourselves with our competitors to understand our share and position in the market.

Many of you may say comparative analysis sounds simple. Let me give you an example. There is a login page for an e-commerce company. Yesterday’s PV was 5000. how would you feel about this kind of data? You won’t feel anything. If the average PV of this sign-in page is 10,000, it means there was a major problem yesterday. If the average PV of the check-in page is 2,000, there was a jump yesterday. Data can only be meaningful by comparison.