# Basic statistics concepts

Basic statistics concepts, The Z value is a measure of standard deviation, or how far the observed value deviates from the mean. For example, the value of z value = +1.8 indicates that the observed value is +1.8 standard deviations from the mean.

Probabilities are represented by p-values. The standard normal distribution is linked with both of these statistics terminologies. In the Z-table, you can see the p-values linked with each z-value.

Nonlinear Regression Analysis in R »

The formula for calculating the z value is as follows: Here, x is the curve’s point, Mu is the population’s mean, and Sigma is the population’s standard deviation. The Central Limit Theorem is a crucial statistician’s theorem. We’ll explain it using an example rather than definitions.

Take a look at the example below. We have data on 1000 students in the tenth grade, as well as their total marks. The population’s derived main metrics are as follows. Let’s choose a random sample of 40 students from this group. So, how many samples from this population can we get?

We have a total of 25 samples available (1000/40 = 25).

Can you guarantee that each sample will receive the same average score as the population (48.4)?

Common Misconceptions About Machine Learning »

Although it is ideal, it is improbable that every sample will have the same average.

We took 1000 samples from 40 students in this study. Examine the frequency distribution of these sample averages of thousands of samples, as well as other statistical metrics. Is this the same distribution as the one we looked at earlier?

Yes, this table is also distributed normally. You may get this file here for a better understanding, and while conducting this exercise, you will come across the following findings:

1. The population mean is fairly close to the mean of sample means (1000 sample means).

2. The population standard deviation divided by the square root of sample size N yields the standard deviation of the sample distribution, often known as the standard error of means.

3. Regardless of the distribution of the real population, the sample mean distribution is normal. The Central Limit Theorem is the name for this.

This has the potential to be really effective. We contrasted the sample mean and population mean in our first example of XY School students.

We looked at the sample mean distribution and calculated the distance between the population mean and the sample mean. You may always use a normal distribution in these situations and not bother about the population distribution.

Based on the above findings, you may compute the standard deviation and mean, as well as the z-score and p-value. The sample size must be substantial (>=30) to satisfy the CLT theorem.

Clustering Example Step-by-Step Methods in R »

Let’s pretend we’ve worked out the random chance probability. Should I go with the first conclusion or the other if it ends out to be 40%?. The “Significance Level” will assist us in making this decision.

### What does it mean to have a significant level of significance?

We assumed that the probability of a sample mean of 95 is 40%, which is a high probability, implying that there is a greater chance that this occurred due to randomness rather than behavioral differences.

It would have been a no-brainer to conclude that it is not attributable to chance if the likelihood had been 7%. Because probability is low, there may be some differences in behavior.

High probability leads to acceptance of randomness, while low probability leads to behavioral differences.

What criteria do we use to determine what is a high probability event and what is a low probability event?

To be honest, it’s a really subjective topic. In some business settings, 90 percent is regarded as a high chance, while in others, 99 percent is considered a high probability.

In general, a cutoff of 5% is agreed upon across all domains. This 5% is known as the Significance Level, or alpha level (symbolized as ). It indicates that if the probability of a random chance event is less than 5%, we can conclude that two populations behave differently. (1- Significance level) is also known as Confidence Level; for example, 95% are convinced that it is not influenced by chance.

So far, we’ve looked at the techniques for determining if a sample mean differs from the population mean or if the difference is due to random chance.

Let’s take a look at how to conduct a hypothesis test, and then we’ll walk through it with an example.

What are the steps involved in conducting hypothesis testing?

Set up Hypothesis (NULL and Alternate): We really tested a hypothesis in the XY School example. The hypothesis that we’re testing is that the difference between the sample and population means is due to chance.

The “NULL Hypothesis” states that there is no difference between the sample and the population. The null hypothesis is denoted by the letter ‘H0.’ Keep in mind that we are only testing the null hypothesis since we believe it is incorrect.

An alternative hypothesis for the XY School is that there is a significant difference in behavior between the sample and the population. ‘H1’ is the sign for the alternative hypothesis.

Customer Segmentation K Means Cluster »

Set the Decision Criteria: The level of significance for a test is used to set the decision criteria. It could be 5% and 1% of the total.

We choose whether to accept the Null or Alternate hypothesis based on the level of significance. There is a 0.03 probability that accepts the null hypothesis at a 1% level of significance but rejects it at a 5% level of significance. It is depending on the needs of the company. The overall accepted level is 0.05.

Calculate the likelihood of a random event: The test statistic/random chance probability aids in determining the likelihood. The Null hypothesis has a higher probability and sufficient evidence to be accepted.

Make a decision: We compare the p-value to a specified significance level and reject the null hypothesis if it is less than the significance level; else, we accept it.

We may make a mistake when deciding whether to keep or reject the null hypothesis since we are looking at a sample rather than the complete population. In terms of the truth or falsity of the judgment we make concerning a null hypothesis, we have four options:

1. It’s possible that the decision to keep the null hypothesis is right.

2. Type II mistake occurs when the decision to keep the null hypothesis is erroneous.

3. It’s possible that rejecting the null hypothesis was the right decision.

4. A Type I error occurs when the judgment to reject the null hypothesis is wrong.