PDS video 6.1 does an excellent job of discussing statistics for one variable, also known as univariate statistics or descriptive statistics. This chapter will provide some additional help along with methods in R/RStudio to easily gather univariate statistics.
In the last chapter, we saw how to graph a single variable. With these graphs, we can get a sense of all three aspects of a variable discussed in the video.
Base R and many packages provide tools for looking at this data. I am going to show you two.
8.1.1tbl_summary for categorical data
For categorical data, the best presentation gives counts and percentages for each of the categories, tbl_summary from the gt_summary package presents this data clearly and efficiently.
anes_2020_smaller |>tbl_summary()
Characteristic
N = 8,2801
PRE: Favor or oppose ending birthright citizenship
PRE: SUMMARY: Respondent 5 Category level of education
1. Less than high school credential
376 (4.6%)
2. High school credential
1,336 (16%)
3. Some post-high school, no bachelor's degree
2,790 (34%)
4. Bachelor's degree
2,055 (25%)
5. Graduate degree
1,592 (20%)
Unknown
131
V200010a
0.74 (0.45, 1.24)
V202164
50 (40, 75)
Unknown
956
1 n (%); Median (Q1, Q3)
8.1.2Skim in SkimR for continuous data
The video and the OpenStax textbook explain the key measure of dispersion and measures of central tendency. They are easy to view in R and there are many tools to do this. The most complete is the skim command in the skimr package. tbl_summary presents some of this data.
# anes_pilot_small |># skim() |># yank("numeric")
As you can see above, skim provides a lot of information in a small space. For continuous data (numerics in R) skim provides:
variable name (variable),
missing values (n_missing),
the number of observations or cases (complete and n),
the mean (mean),
standard deviation (sd),
the minimum value (p0),
the 25th percentile (p25),
the 50th percentile the median (p50),
the 75th percentile (p75),
the maximum value (p100),
and a small histogram (noted as hist).
For categorical data (factors in R) skim provides a count of each category.
It also lists character data.
8.2 Viewing single variables
The frq command does a great job of providing frequencies for individual categorical variables.
frq(ethiopia_small$Q253)
Respect for individual human rights nowadays (x) <categorical>
# total N=1230 valid N=1215 mean=2.51 sd=0.93
Value | N | Raw % | Valid % | Cum. %
--------------------------------------------------------
A great deal of respect | 169 | 13.74 | 13.91 | 13.91
Fairly much respect | 468 | 38.05 | 38.52 | 52.43
Not much respect | 371 | 30.16 | 30.53 | 82.96
No respect at all | 207 | 16.83 | 17.04 | 100.00
<NA> | 15 | 1.22 | <NA> | <NA>
You can also run skim and summary on a single quantitative variable.