Key Terms
Goal
Turn a raw data dump into something you can actually interpret.
Example
72 has stem 7, leaf 2. Value 143 has stem 14, leaf 3.
Side-by-side stemplot
Two data sets share the same stems; one set's leaves go left, the other goes right. Good for direct comparison.
OUTLIER
A data point that doesn't fit the pattern. May be a data entry error or a genuinely unusual value.
How to build one (continuous data)
1. Find smallest and largest values
Shortcut for number of bars
Take the square root of n, round to nearest whole.
Main use
Spotting trends over time that would be invisible in a static summary.
To find quartiles
1. Order data smallest to largest 2.
Location of median
(n + 1) / 2
Symbol for sample mean
X-bar Symbol for population mean: mu (Greek letter)
Use median when
Data has extreme values or outliers, values are missing, there is an open-ended category (like "6 or more"), or data is
Use mode when
Data is categorical (nominal scale). Mode is the only measure of center that works for non-numerical data (example: the
SYMMETRIC
Left and right sides are mirror images. Mean = median = mode (approximately).
RIGHT (POSITIVE) SKEW
Tail stretches toward higher values. A few high outliers pull the mean right.
LEFT (NEGATIVE) SKEW
Tail stretches toward lower values. A few low outliers pull the mean left.