Understanding Statistics - Research Methods Part-4
Post
Cancel

# Understanding Statistics - Research Methods Part-4

### Central Tendency

• Given a Histogram, how would you choose a number/numbers that accurately represents the typical salary

• The value at which the frequency is highest called the mode, and this certainly works in describing the distribution, the most common value is the mode.

• Value in the middle is called the median and this will also work

• Average is a statistic that rests at a specific spot in the middle of the distribution

• Mode

• Mode occurs with the highest frequency
• So what is the mode of [2, 5, 5, 9, 8, 3]
• Answer = 5 because it occurs twice in the dataset
• In case where there are thousands of data point in a histogram, then the mode is the range that occurred with the highest frequency, because we cannot see the individual values but we can see which bin has the highest frequency.
• In case where the entire histogram is at same level then it is called Uniform Distribution, such distributions have no mode.
• In case where there are two or more distinct clear trends will have more than one mode, making it a bimodal distribution.
• Mode Explained

• Mode can be used to describe any type of data we have, whether it is numerical or categorical.
• Not all scores in the dataset affect mode only the repeating ones.
• If we take a lot of samples from the same population, the mode will be different for each sample.
• Mode changes with change in bin size.
• There is no equation for mode, there is a procedure to find the mode, but we cannot describe it with an equation, since it really depends on how we present the data.
• Mean
• Sum of all the numbers divided by the total numbers
• If data is [1, 2, 3, 4, 5] then mean = (1 + 2 + 3 + 4 + 5)/5
• For sample we say x bar = Sigma of x divided by n (small n)
• For sample we say mu = Sigma of x divided by N (capital N)
• Properties of Mean
• All scores in the distribution affect the mean.
• Think of mean as a pivot trying to keep the scale balanced, if we add/remove a score the scale will become off-balance and will have to be recalculated to re-balance it.
• The mean can be described with a formula.
• Many samples from the same population will have similar or roughly similar mean.
• The mean of the sample can be used to make inferences about the population it came from.
• The mean will change if we add an extreme value to the dataset.
• This is known as outlier, these are the values that are unexpectedly different from the other observed values.
• Outliers create skewed distributions by pulling the mean towards the outlier and this causes misleading average/mean.
• Median
• Sort the data
• Find the middle value of the data
• Median of even numbers is calculated by
• First sorting them
• Then we select the two middle values
• Take average of these two middle values
• When data has outlier, the median does not get affected much by departures from the norm, this tendency of median is called robust
• Median Formula -For even values where X is the value and n is the position of the value - (X(n/2) + X(n/2+1))/2 find the two middle values and then find average of those two values -For odd values - X(n+1)/2

• Positively Skewed (High frequency towards Left)
• Mode or highest frequency will be towards the left due to highest frequency being there
• Mean will be pulled towards the right because of lot of smaller non repeating values are in right
• Median will be in the middle of Mode and Mean
• So Mode is less than Median which is less than Mean (Mode < Median < Mean)
• Normally Distributed (frequency in centre)
• Mean will be equal to Median which will be equal to Mode (Mean = Median = Mode)
• Mode will occur in the centre bin where the frequency is the highest.
• But also since the distribution is symmetrical therefore the Mean and the Median will both occur pretty much right in the centre.
• Measure Of Centre
• Mean:
• Mean has simple equation
• Mean will always change if any data value changes
• Mean is not affected by change in bin size, it will always be the same, not matter how we visualize the data with the histogram
• Mean is affected severely by outliers
• Mean is not easy to find just by looking at the histogram
• Median
• Median does not has a simple equation
• Median will not always change if any data value changes
• Median is not affected by change in bin size
• Median is not affected severely by outliers
• Median is not easy to find just by looking at the histogram
• Mode
• Mode does not has a equation
• Mode will not always change if any data value changes
• Mode is affected by change in bin size
• Mode is not affected severely by outliers
• Mode is easy to find just by looking at the histogram, because it is the highest frequency
• It can be used to describe categorical data, such as gender or country of origin
• In an introductory statistics course, the same number of students scored below 75% as above 75% on the final exam. What shape(s) could the distribution of final exam scores have? - This is another way of saying that 75% was the median score on the exam. All of these distributions can have a median of 75%. - Uniform - Normal - Bimodal - Positively Skewed - Negatively Skewed