Understanding Statistics - Research Methods Part-4

Continuation of Part 3.

Central Tendency

Given a Histogram, how would you choose a number/numbers that accurately represents the typical salary
- The value at which the frequency is highest called the mode, and this certainly works in describing the distribution, the most common value is the mode.
- Value in the middle is called the median and this will also work
- Average is a statistic that rests at a specific spot in the middle of the distribution
Mode
- Mode occurs with the highest frequency
- So what is the mode of [2, 5, 5, 9, 8, 3]
- Answer = 5 because it occurs twice in the dataset
- In case where there are thousands of data point in a histogram, then the mode is the range that occurred with the highest frequency, because we cannot see the individual values but we can see which bin has the highest frequency.
- In case where the entire histogram is at same level then it is called Uniform Distribution, such distributions have no mode.
- In case where there are two or more distinct clear trends will have more than one mode, making it a bimodal distribution.
Mode Explained
- Mode can be used to describe any type of data we have, whether it is numerical or categorical.
- Not all scores in the dataset affect mode only the repeating ones.
- If we take a lot of samples from the same population, the mode will be different for each sample.
- Mode changes with change in bin size.
- There is no equation for mode, there is a procedure to find the mode, but we cannot describe it with an equation, since it really depends on how we present the data.
Mean
- Sum of all the numbers divided by the total numbers
- If data is [1, 2, 3, 4, 5] then mean = (1 + 2 + 3 + 4 + 5)/5
- For sample we say x bar = Sigma of x divided by n (small n)
- For sample we say mu = Sigma of x divided by N (capital N)
Properties of Mean
- All scores in the distribution affect the mean.
  - Think of mean as a pivot trying to keep the scale balanced, if we add/remove a score the scale will become off-balance and will have to be recalculated to re-balance it.
- The mean can be described with a formula.
- Many samples from the same population will have similar or roughly similar mean.
- The mean of the sample can be used to make inferences about the population it came from.
- The mean will change if we add an extreme value to the dataset.
  - This is known as outlier, these are the values that are unexpectedly different from the other observed values.
  - Outliers create skewed distributions by pulling the mean towards the outlier and this causes misleading average/mean.
Median
- Sort the data
- Find the middle value of the data
- Median of even numbers is calculated by
  - First sorting them
  - Then we select the two middle values
  - Take average of these two middle values
- When data has outlier, the median does not get affected much by departures from the norm, this tendency of median is called robust
Median Formula -For even values where X is the value and n is the position of the value - (X(n/2) + X(n/2+1))/2 find the two middle values and then find average of those two values -For odd values - X(n+1)/2
Positively Skewed (High frequency towards Left)
- Mode or highest frequency will be towards the left due to highest frequency being there
- Mean will be pulled towards the right because of lot of smaller non repeating values are in right
- Median will be in the middle of Mode and Mean
- So Mode is less than Median which is less than Mean (Mode < Median < Mean)
Normally Distributed (frequency in centre)
- Mean will be equal to Median which will be equal to Mode (Mean = Median = Mode)
- Mode will occur in the centre bin where the frequency is the highest.
- But also since the distribution is symmetrical therefore the Mean and the Median will both occur pretty much right in the centre.
Measure Of Centre
- Mean:
  - Mean has simple equation
  - Mean will always change if any data value changes
  - Mean is not affected by change in bin size, it will always be the same, not matter how we visualize the data with the histogram
  - Mean is affected severely by outliers
  - Mean is not easy to find just by looking at the histogram
- Median
  - Median does not has a simple equation
  - Median will not always change if any data value changes
  - Median is not affected by change in bin size
  - Median is not affected severely by outliers
  - Median is not easy to find just by looking at the histogram
- Mode
  - Mode does not has a equation
  - Mode will not always change if any data value changes
  - Mode is affected by change in bin size
  - Mode is not affected severely by outliers
  - Mode is easy to find just by looking at the histogram, because it is the highest frequency
  - It can be used to describe categorical data, such as gender or country of origin
In an introductory statistics course, the same number of students scored below 75% as above 75% on the final exam. What shape(s) could the distribution of final exam scores have? - This is another way of saying that 75% was the median score on the exam. All of these distributions can have a median of 75%. - Uniform - Normal - Bimodal - Positively Skewed - Negatively Skewed

Understanding Statistics - Research Methods Part-4

Continuation of Part 3.

Central Tendency

Recent Update

Trending Tags

Contents

Trending Tags

Understanding Statistics - Research Methods Part-4

Continuation of Part 3.

Central Tendency

Recent Update

Trending Tags

Contents

Further Reading

Understanding Statistics - Research Methods Part-1

Understanding Statistics - Research Methods Part-2

Understanding Statistics - Research Methods Part-3

Trending Tags