Posts Understanding Statistics - Research Methods Part-4
Post
Cancel

Understanding Statistics - Research Methods Part-4

Continuation of Part 3.

Central Tendency

  • Given a Histogram, how would you choose a number/numbers that accurately represents the typical salary

    • The value at which the frequency is highest called the mode, and this certainly works in describing the distribution, the most common value is the mode.

    • Value in the middle is called the median and this will also work

    • Average is a statistic that rests at a specific spot in the middle of the distribution

  • Mode

    • Mode occurs with the highest frequency
    • So what is the mode of [2, 5, 5, 9, 8, 3]
    • Answer = 5 because it occurs twice in the dataset
    • In case where there are thousands of data point in a histogram, then the mode is the range that occurred with the highest frequency, because we cannot see the individual values but we can see which bin has the highest frequency.
    • In case where the entire histogram is at same level then it is called Uniform Distribution, such distributions have no mode.
    • In case where there are two or more distinct clear trends will have more than one mode, making it a bimodal distribution.
  • Mode Explained

    • Mode can be used to describe any type of data we have, whether it is numerical or categorical.
    • Not all scores in the dataset affect mode only the repeating ones.
    • If we take a lot of samples from the same population, the mode will be different for each sample.
    • Mode changes with change in bin size.
    • There is no equation for mode, there is a procedure to find the mode, but we cannot describe it with an equation, since it really depends on how we present the data.
  • Mean
    • Sum of all the numbers divided by the total numbers
    • If data is [1, 2, 3, 4, 5] then mean = (1 + 2 + 3 + 4 + 5)/5
    • For sample we say x bar = Sigma of x divided by n (small n)
    • For sample we say mu = Sigma of x divided by N (capital N)
  • Properties of Mean
    • All scores in the distribution affect the mean.
      • Think of mean as a pivot trying to keep the scale balanced, if we add/remove a score the scale will become off-balance and will have to be recalculated to re-balance it.
    • The mean can be described with a formula.
    • Many samples from the same population will have similar or roughly similar mean.
    • The mean of the sample can be used to make inferences about the population it came from.
    • The mean will change if we add an extreme value to the dataset.
      • This is known as outlier, these are the values that are unexpectedly different from the other observed values.
      • Outliers create skewed distributions by pulling the mean towards the outlier and this causes misleading average/mean.
  • Median
    • Sort the data
    • Find the middle value of the data
    • Median of even numbers is calculated by
      • First sorting them
      • Then we select the two middle values
      • Take average of these two middle values
    • When data has outlier, the median does not get affected much by departures from the norm, this tendency of median is called robust
  • Median Formula -For even values where X is the value and n is the position of the value - (X(n/2) + X(n/2+1))/2 find the two middle values and then find average of those two values -For odd values - X(n+1)/2

  • Positively Skewed (High frequency towards Left)
    • Mode or highest frequency will be towards the left due to highest frequency being there
    • Mean will be pulled towards the right because of lot of smaller non repeating values are in right
    • Median will be in the middle of Mode and Mean
    • So Mode is less than Median which is less than Mean (Mode < Median < Mean)
  • Normally Distributed (frequency in centre)
    • Mean will be equal to Median which will be equal to Mode (Mean = Median = Mode)
    • Mode will occur in the centre bin where the frequency is the highest.
    • But also since the distribution is symmetrical therefore the Mean and the Median will both occur pretty much right in the centre.
  • Measure Of Centre
    • Mean:
      • Mean has simple equation
      • Mean will always change if any data value changes
      • Mean is not affected by change in bin size, it will always be the same, not matter how we visualize the data with the histogram
      • Mean is affected severely by outliers
      • Mean is not easy to find just by looking at the histogram
    • Median
      • Median does not has a simple equation
      • Median will not always change if any data value changes
      • Median is not affected by change in bin size
      • Median is not affected severely by outliers
      • Median is not easy to find just by looking at the histogram
    • Mode
      • Mode does not has a equation
      • Mode will not always change if any data value changes
      • Mode is affected by change in bin size
      • Mode is not affected severely by outliers
      • Mode is easy to find just by looking at the histogram, because it is the highest frequency
      • It can be used to describe categorical data, such as gender or country of origin
  • In an introductory statistics course, the same number of students scored below 75% as above 75% on the final exam. What shape(s) could the distribution of final exam scores have? - This is another way of saying that 75% was the median score on the exam. All of these distributions can have a median of 75%. - Uniform - Normal - Bimodal - Positively Skewed - Negatively Skewed
This post is licensed under CC BY 4.0 by the author.