Warm tip: This article is reproduced from serverfault.com, please click

is there a better way with e.g., dplyr?

发布于 2020-12-02 11:06:40

I am trying to create a basic table with frequencies of a categorical variable (Relationship_type) grouped by another variable (Country), preferably using dplyr library (or anything else that is easier to export as a .csv file than table()).

head(d)
Country                        Relationship_type
1 Algeria                                      2
2 Bulgaria                                     1
3 USA                                          2
4 Algeria                                      3
5 Germany                                      2
6 USA                                          1

I want this to look like an output from basic table(d$Country, d$Relationship_type) function:

                    2   3   4
  Algeria         141  47 137
  Australia       128  27 103
  Austria          97   5  17
  Belgium         172  16  71
  Brazil          104   6  70
  CHILE            54   4  46

Tried several combinations of tally(), group_by, count() etc. but can't figure it out.

Could you please help a bit??

All the best, R_beginner

Questioner
chillos
Viewed
0
Len Greski 2020-12-02 19:59:10

Another approach is to use tables::tabular() as follows.

textData <- "id Country                        Relationship_type
1 Algeria                                      2
2 Bulgaria                                     1
3 USA                                          2
4 Algeria                                      3
5 Germany                                      2
6 USA                                          1
7 Algeria                                      1
8 Bulgaria                                     3
9 USA                                          2
10 Algeria                                     2
11 Germany                                     1
12 USA                                         3"

df <- read.table(text=textData,header=TRUE)
library(tables)
tabular(Factor(Country) ~ Factor(Relationship_type),data=df)

...and the output:

          Relationship_type    
 Country  1                 2 3
 Algeria  1                 2 1
 Bulgaria 1                 0 1
 Germany  1                 1 0
 USA      1                 2 1

Still another approach is to recast the output from table() as a data frame, and pivot it wider with tidyr::pivot_wider().

# another approach: recast table output as data.frame
tableData <- data.frame(table(df$Country,df$Relationship_type))
library(dplyr)
library(tidyr)
tableData %>% 
     pivot_wider(id_cols = Var1,
                 names_from = Var2,
                 values_from = Freq)

...and the output:

> tableData %>% 
+      pivot_wider(id_cols = Var1,
+                  names_from = Var2,
+                  values_from = Freq)
# A tibble: 4 x 4
  Var1       `1`   `2`   `3`
  <fct>    <int> <int> <int>
1 Algeria      1     2     1
2 Bulgaria     1     0     1
3 Germany      1     1     0
4 USA          1     2     1

If we add a dplyr::rename() to the pipeline, we can rename the Var1 column to Country.

tableData %>% 
     pivot_wider(id_cols = Var1,
                 names_from = Var2,
                 values_from = Freq) %>%
     rename(Country = Var1)

As usual, there are many ways in R to accomplish this task. Depending on the reason why the desired output is a CSV file, there are a variety of approaches that could fit the requirements. If the ultimate goal is to create presentation quality tables, then it's worth a look at this summary of packages that create presentation quality tables: How gt fits with other packages that create display tables.