I am trying to create a basic table with frequencies of a categorical variable (Relationship_type) grouped by another variable (Country), preferably using dplyr library (or anything else that is easier to export as a .csv file than table()).
head(d)
Country Relationship_type
1 Algeria 2
2 Bulgaria 1
3 USA 2
4 Algeria 3
5 Germany 2
6 USA 1
I want this to look like an output from basic table(d$Country, d$Relationship_type) function:
2 3 4
Algeria 141 47 137
Australia 128 27 103
Austria 97 5 17
Belgium 172 16 71
Brazil 104 6 70
CHILE 54 4 46
Tried several combinations of tally(), group_by, count() etc. but can't figure it out.
Could you please help a bit??
All the best, R_beginner
Another approach is to use tables::tabular()
as follows.
textData <- "id Country Relationship_type
1 Algeria 2
2 Bulgaria 1
3 USA 2
4 Algeria 3
5 Germany 2
6 USA 1
7 Algeria 1
8 Bulgaria 3
9 USA 2
10 Algeria 2
11 Germany 1
12 USA 3"
df <- read.table(text=textData,header=TRUE)
library(tables)
tabular(Factor(Country) ~ Factor(Relationship_type),data=df)
...and the output:
Relationship_type
Country 1 2 3
Algeria 1 2 1
Bulgaria 1 0 1
Germany 1 1 0
USA 1 2 1
Still another approach is to recast the output from table()
as a data frame, and pivot it wider with tidyr::pivot_wider()
.
# another approach: recast table output as data.frame
tableData <- data.frame(table(df$Country,df$Relationship_type))
library(dplyr)
library(tidyr)
tableData %>%
pivot_wider(id_cols = Var1,
names_from = Var2,
values_from = Freq)
...and the output:
> tableData %>%
+ pivot_wider(id_cols = Var1,
+ names_from = Var2,
+ values_from = Freq)
# A tibble: 4 x 4
Var1 `1` `2` `3`
<fct> <int> <int> <int>
1 Algeria 1 2 1
2 Bulgaria 1 0 1
3 Germany 1 1 0
4 USA 1 2 1
If we add a dplyr::rename()
to the pipeline, we can rename the Var1
column to Country
.
tableData %>%
pivot_wider(id_cols = Var1,
names_from = Var2,
values_from = Freq) %>%
rename(Country = Var1)
As usual, there are many ways in R to accomplish this task. Depending on the reason why the desired output is a CSV file, there are a variety of approaches that could fit the requirements. If the ultimate goal is to create presentation quality tables, then it's worth a look at this summary of packages that create presentation quality tables: How gt fits with other packages that create display tables.
Thanks you for the answer! I will definitely check the gt:) The tabular gave me only one's in each of the Relationship_type column, as I have numerous participants from the given Country as was shown with the table() output I added in the first post (and not the binary yes or no situation), so I don't think it improves the current situation anyhow. ooo! Haven't noticed your pivot suggestion. That's what I have been looking for, THANK YOU!
@chillos - thanks for the feedback. I've updated my answer. If you remove the *(N=1)
tabular()
will automatically count the number of observations. I also added test data to demonstrate that the updated solution works, as you can see from the output where both solutions now have a total of 12 counts across the table cells, and some of the counts are >1.