8 Statistics for One Variable

PDS video 6.1 does an excellent job of discussing statistics for one variable, also known as univariate statistics or descriptive statistics. This chapter will provide some additional help along with methods in R/RStudio to easily gather univariate statistics.

In the last chapter, we saw how to graph a single variable. With these graphs, we can get a sense of all three aspects of a variable discussed in the video.

Center—also known as measures of central tendency
Spread—also known as measures of dispersion such as range and interquartile range standard deviation
Shape—also known as skewness

Here is a link to the excellent OpenStax statistics textbook with clear explanations of each of these three ideas. Links are above. You may also want to view these DATA tab videos on measures of central tendency and dispersion,

8.1 Viewing univariate statistics

Base R and many packages provide tools for looking at this data. I am going to show you two.

8.1.1 `tbl_summary` for categorical data

For categorical data, the best presentation gives counts and percentages for each of the categories, tbl_summary from the gt_summary package presents this data clearly and efficiently.

anes_2020_smaller |>
  tbl_summary()

Characteristic	N = 8,280¹
PRE: Favor or oppose ending birthright citizenship
1. Favor	2,217 (27%)
2. Oppose	3,734 (45%)
3. Neither favor nor oppose	2,280 (28%)
Unknown	49
PRE: 7pt scale liberal-conservative: Democratic Presidential candidate
1. Extremely liberal	1,605 (20%)
2. Liberal	2,516 (31%)
3. Slightly liberal	1,588 (19%)
4. Moderate; middle of the road	1,623 (20%)
5. Slightly conservative	352 (4.3%)
6. Conservative	314 (3.8%)
7. Extremely conservative	161 (2.0%)
Unknown	121
V202173	85 (70, 100)
Unknown	913
PRE: SUMMARY: Respondent 5 Category level of education
1. Less than high school credential	376 (4.6%)
2. High school credential	1,336 (16%)
3. Some post-high school, no bachelor's degree	2,790 (34%)
4. Bachelor's degree	2,055 (25%)
5. Graduate degree	1,592 (20%)
Unknown	131
V200010a	0.74 (0.45, 1.24)
V202164	50 (40, 75)
Unknown	956
¹ n (%); Median (Q1, Q3)

8.1.2 `Skim` in `SkimR` for continuous data

The video and the OpenStax textbook explain the key measure of dispersion and measures of central tendency. They are easy to view in R and there are many tools to do this. The most complete is the skim command in the skimr package. tbl_summary presents some of this data.

# anes_pilot_small |>
#   skim() |>
#   yank("numeric")

As you can see above, skim provides a lot of information in a small space. For continuous data (numerics in R) skim provides:

variable name (variable),
missing values (n_missing),
the number of observations or cases (complete and n),
the mean (mean),
standard deviation (sd),
the minimum value (p0),
the 25th percentile (p25),
the 50th percentile the median (p50),
the 75th percentile (p75),
the maximum value (p100),
and a small histogram (noted as hist).

For categorical data (factors in R) skim provides a count of each category.

It also lists character data.

8.2 Viewing single variables

The frq command does a great job of providing frequencies for individual categorical variables.

frq(ethiopia_small$Q253)

Respect for individual human rights nowadays (x) <categorical> 
# total N=1230 valid N=1215 mean=2.51 sd=0.93

Value                   |   N | Raw % | Valid % | Cum. %
--------------------------------------------------------
A great deal of respect | 169 | 13.74 |   13.91 |  13.91
Fairly much  respect    | 468 | 38.05 |   38.52 |  52.43
Not much respect        | 371 | 30.16 |   30.53 |  82.96
No respect at all       | 207 | 16.83 |   17.04 | 100.00
<NA>                    |  15 |  1.22 |    <NA> |   <NA>

You can also run skim and summary on a single quantitative variable.

anes_pilot_small |>
  skim(ftscotus)

Data summary
Name	anes_pilot_small
Number of rows	1585
Number of columns	11
_______________________
Column type frequency:
numeric	1
________________________
Group variables	None

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
ftscotus	0	1	50.44	29.05	0	30	51	74	100	▆▅▇▆▆

8.1 Viewing univariate statistics

8.1.1 tbl_summary for categorical data

8.1.2 Skim in SkimR for continuous data

8.2 Viewing single variables

8.1.1 `tbl_summary` for categorical data

8.1.2 `Skim` in `SkimR` for continuous data