Mean and Median - Central Tendency
The mean is the sum of all data divided by count, the most common representative value. However, it can be distorted by extreme values. The median is the middle value when data is sorted, providing a more stable center unaffected by outliers. For example, the mean of 1, 2, 3, 4, 100 is 22, but the median is 3, better reflecting the actual distribution.
Variance and Standard Deviation - Data Spread
Variance measures how far each data point is from the mean. Standard deviation is the square root of variance, having the same unit as the data for easier interpretation. Large standard deviation means data is widely spread around the mean; small means clustered near it. Used to measure volatility in finance and product consistency in quality control.
Quartiles and IQR - Understanding Distribution
Quartiles divide data into four equal parts: Q1 (25%), Q2 (50%, median), Q3 (75%). IQR (Interquartile Range) is Q3 - Q1, showing how spread the middle 50% of data is. IQR is used for outlier detection. Values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR are considered outliers.
Outlier Detection - 1.5×IQR Rule
Outliers are values significantly different from other data, possibly due to measurement errors or special events. The most common detection method is the 1.5×IQR rule. Values smaller than Q1 - 1.5×IQR or larger than Q3 + 1.5×IQR are considered outliers. Identifying outliers improves data quality and reveals special patterns.
Histogram - Visualizing Distribution
A histogram divides data into bins and displays the frequency of each bin as bars. It reveals the distribution shape (normal, skewness, kurtosis) at a glance. Symmetric bell shape indicates normal distribution; skewed shape indicates biased distribution. Histograms easily show mode intervals, distribution patterns, and outliers.
Box Plot - Five Key Statistics
A box plot displays five statistics in one graph: minimum, Q1, median (Q2), Q3, maximum. The box represents IQR (Q1~Q3), the line inside shows median, and whiskers show normal range. Points outside whiskers are outliers. Very useful for comparing multiple groups or determining distribution symmetry.