8  Statistics for One Variable

PDS video 6.1 does an excellent job of discussing statistics for one variable, also known as univariate statistics or descriptive statistics. This chapter will provide some additional help along with methods in R/RStudio to easily gather univariate statistics.

In the last chapter, we saw how to graph a single variable. With these graphs, we can get a sense of all three aspects of a variable discussed in the video.

Here is a link to the excellent OpenStax statistics textbook with clear explanations of each of these three ideas. Links are above. You may also want to view these DATA tab videos on measures of central tendency and dispersion,

8.1 Viewing univariate statistics

Base R and many packages provide tools for looking at this data. I am going to show you two.

8.1.1 tbl_summary for categorical data

For categorical data, the best presentation gives counts and percentages for each of the categories, tbl_summary from the gt_summary package presents this data clearly and efficiently.

anes_2020_smaller |>
  tbl_summary()
Characteristic N = 8,2801
PRE: Favor or oppose ending birthright citizenship
    1. Favor 2,217 (27%)
    2. Oppose 3,734 (45%)
    3. Neither favor nor oppose 2,280 (28%)
    Unknown 49
PRE: 7pt scale liberal-conservative: Democratic Presidential candidate
    1. Extremely liberal 1,605 (20%)
    2. Liberal 2,516 (31%)
    3. Slightly liberal 1,588 (19%)
    4. Moderate; middle of the road 1,623 (20%)
    5. Slightly conservative 352 (4.3%)
    6. Conservative 314 (3.8%)
    7. Extremely conservative 161 (2.0%)
    Unknown 121
V202173 85 (70, 100)
    Unknown 913
PRE: SUMMARY: Respondent 5 Category level of education
    1. Less than high school credential 376 (4.6%)
    2. High school credential 1,336 (16%)
    3. Some post-high school, no bachelor's degree 2,790 (34%)
    4. Bachelor's degree 2,055 (25%)
    5. Graduate degree 1,592 (20%)
    Unknown 131
V200010a 0.74 (0.45, 1.24)
V202164 50 (40, 75)
    Unknown 956
1 n (%); Median (Q1, Q3)

8.1.2 Skim in SkimR for continuous data

The video and the OpenStax textbook explain the key measure of dispersion and measures of central tendency. They are easy to view in R and there are many tools to do this. The most complete is the skim command in the skimr package. tbl_summary presents some of this data.

# anes_pilot_small |>
#   skim() |>
#   yank("numeric")

As you can see above, skim provides a lot of information in a small space. For continuous data (numerics in R) skim provides:

  • variable name (variable),
  • missing values (n_missing),
  • the number of observations or cases (complete and n),
  • the mean (mean),
  • standard deviation (sd),
  • the minimum value (p0),
  • the 25th percentile (p25),
  • the 50th percentile the median (p50),
  • the 75th percentile (p75),
  • the maximum value (p100),
  • and a small histogram (noted as hist).

For categorical data (factors in R) skim provides a count of each category.

It also lists character data.

8.2 Viewing single variables

The frq command does a great job of providing frequencies for individual categorical variables.

frq(ethiopia_small$Q253)
Respect for individual human rights nowadays (x) <categorical> 
# total N=1230 valid N=1215 mean=2.51 sd=0.93

Value                   |   N | Raw % | Valid % | Cum. %
--------------------------------------------------------
A great deal of respect | 169 | 13.74 |   13.91 |  13.91
Fairly much  respect    | 468 | 38.05 |   38.52 |  52.43
Not much respect        | 371 | 30.16 |   30.53 |  82.96
No respect at all       | 207 | 16.83 |   17.04 | 100.00
<NA>                    |  15 |  1.22 |    <NA> |   <NA>

You can also run skim and summary on a single quantitative variable.

anes_pilot_small |>
  skim(ftscotus)
Data summary
Name anes_pilot_small
Number of rows 1585
Number of columns 11
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ftscotus 0 1 50.44 29.05 0 30 51 74 100 ▆▅▇▆▆