Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
521 views
in Technique[技术] by (71.8m points)

tidyverse - R function for collapsing multiple ranges of different columns from wide to long format?

I've a dataset with multiple different ranges of columns in each row (each row corresponds to one individual), as below. Each instance of the different column types have 3 levels (0,1 and 2).

id  col1_0 col1_1 col1_2  col2_0  col2_1 col2_2  col3_0 col3_1 col3_2
1       0      1      3       2       2      3       3      4      5
2       1      1      2       2       4      7       4      5      5
.
.
etc. 

What I would need is to collapse all col1 into one column, all col2 into another and all col3's into another, for each id. As below.

id  x  col1 col2 col4
1   0     0    2    3       
1   1     1    2    4
1   2     3    3    5
2   0     1    2    4
2   1     1    4    5
2   2     1    7    5
.
.
etc.

In addition, I would also need to create an x-column with values 0,1 and 2, for each id. However, I only manage to collapse the first range of columns (col1) with the code below.

library(tidyverse)

longer_data <- dataframe %>%
  group_by(id) %>%
  pivot_longer(col1_0:col1_2, names_to = "x1", values_to = "col1")

x1 here creates a column with the original column names. So I would create need an additional x-column that only keeps the last numbers of the original column names.

Is there a way to achieve this? Many thanks in advance!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

We don't need any group_by. It can be directly done with pivot_longer by specifying the names_sep and the .value in names_to. Note the order of .value and x. It implies the values of that column should go into the each of those prefixes before the _ and the new column with suffix stub goes into 'x'

library(dplyr)
library(tidyr)
df1 %>%
   pivot_longer(cols = -id, names_to = c('.value', 'x'), names_sep = "_")

-output

# A tibble: 6 x 5
#     id x      col1  col2  col3
#  <int> <chr> <int> <int> <int>
#1     1 0         0     2     3
#2     1 1         1     2     4
#3     1 2         3     3     5
#4     2 0         1     2     4
#5     2 1         1     4     5
#6     2 2         2     7     5

data

df1 <- structure(list(id = 1:2, col1_0 = 0:1, col1_1 = c(1L, 1L), col1_2 = 3:2, 
    col2_0 = c(2L, 2L), col2_1 = c(2L, 4L), col2_2 = c(3L, 7L
    ), col3_0 = 3:4, col3_1 = 4:5, col3_2 = c(5L, 5L)), 
    class = "data.frame", row.names = c(NA, 
-2L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...