# Machine Learning Statistics

Statistics are tools to get answers to questions about data:

- What is
**Common?** - What is
**Expected?** - What is
**Normal?** - What is the
**Probability?**

## Inferential Statistics

**Inferential statistics** are methods for quantifying properties of a population
from a small **Sample**:

You take data from a sample and make a prediction about the whole population.

For example, you can stand in a shop and ask a **sample of 100 people** if they like chocolate.

From your research, using inferential statistics, you could predict that 91% of **all shoppers** like chocolate.

## Incredible Chocolate Facts

Nine out of ten people love chocolate.

50% of the US population cannot live without chocolate every day.

You use **Inferential Statistics** to predict whole domains from small samples of data.

## Descriptive Statistics

**Descriptive Statistics** summarizes (describes) observations from a set of data.

Since we register every newborn baby, we can tell that 51 out of 100 are boys.

From these collected numbers, we can predict a 51% chance that a new baby will be a boy.

It is a mystery that the ratio is not 50%, like basic biology would predict. We only know that we have had this tilted sex ratio since the 17th century.

## Note

Raw observations are only data. They are not real knowledge.

You use **Descriptive Statistics** to transform raw observations into data that you can understand.

## Descriptive Statistics Measurements

Descriptive statistics are broken down into different measures:

**Tendency** (Measures of the Center)

- The Mean (the average value)value
- The Median (the mid point value)
- The Mode (the most common value)

**Spread** (Measures of Variability)

- Min and Max
- Standard Deviation
- Variance
- Skewness
- Kurtosis