我仍在学习R,我有这个数据集,它有5列,第一列是tracking_id,接下来的四列具有四个组的值。
首先,我要过滤具有等于或大于1的值的行,然后要基于对最后三列(“ CD44hi_CD69low_rep”,“ CD44hi_CD69hi_CD103low_rep”,“ CD44hi_CD69hi_CD103hi_rep”)进行比较的8倍或4来比较行与列(“ CD44low_rep”)相比折叠低。
输出应具有5列,其值等于或大于1,这比第二列高8倍或比最后三列低4倍。
我应该得到这样的东西:
为了过滤等于或大于1的行,我尝试了以下操作:
df1 %>% select_if(is.numeric) %>% filter_all(all_vars(. >= 1))
然后要过滤高8倍或少4倍,我尝试过(由于@akrun):
nm1 <- c("CD44hi_CD69low_rep", "CD44hi_CD69hi_CD103low_rep",
"CD44hi_CD69hi_CD103hi_rep")
i1 <- (rowSums(df1[nm1] >= (df1$CD44low_rep * 8)) == 3) &
(rowSums(df1[nm1] <= (df1$CD44low_rep * 4)) == 3)
但是,我没有任何输入。
我正在按照以下步骤操作:
Analysis and graphic display of RNA-Seq data. A total of 9,085 genes for which
the maximum fragments per kilobase of exon per million mapped reads value in all
samples was ≥1.0 were subjected to further analyses. A principal component analysis
was performed using R (https://www.r-project.org/). Clustering was performed using
APCluster (an R Package for Affinity Propagation Clustering). The transcriptional
signatures of CD44hiCD69lo, CD44hiCD69hiCD103lo and CD44hiCD69hiCD103hi CD4+
T cells were defined with genes for which the expression was eightfold higher or
fourfold lower than that in CD44loCD69lo CD4+ T cells.
For the visualization of the co-regulation network, the 500 genes in the CD44hi
CD4+ T cell groups that showed the greatest variation compared with the naive
(CD44loCD69lo) CD4+ T cell group were subjected to further analyses. The first-
neighbor genes were determined using the following two criteria: (1) a correlation
of >0.8; and (2) a ratio of norm of 0.8–1.25. The network graph of 483 genes was
visualized using Cytoscape (http://www.cytoscape.org/).
我感兴趣的ID是:
values <- c('S100a10', 'Esm1', 'Itgb1', 'Anxa2', 'Hist1h1b',
'Il2rb', 'Lgals1', 'Mki67', 'Rora', 'S100a4',
'S100a6', 'Adam8', 'Areg', 'Bcl2l1', 'Calca',
'Capg', 'Ccr2', 'Cd44', 'Csda', 'Ehd1',
'Id2', 'Il10', 'Il1rl1', 'Il2ra', 'Lmna',
'Maf', 'Penk', 'Podnl1', 'Tiam1', 'Vim',
'Ern1', 'Furin', 'Ifng', 'Igfbp7', 'Il13',
'Il4', 'Il5', 'Nrp1', 'Ptprs', 'Rbpj',
'Spry1', 'Tnfsf11', 'Vdr', 'Xcl1', 'Bmpr2',
'Csf1', 'Dst', 'Foxp3', 'Itgav', 'Itgb8',
'Lamc1', 'Myo1e', 'Pmaip1', 'Prdm1', 'Ptpn5',
'Ramp1', 'Sdc4')
在应用@RonakShah(谢谢!)之后,我得到的是21而不是57:
library(dplyr)
df09 <- read.csv('https://raw.githubusercontent.com/learnseq/learning/main/dfpilot.csv')
filtertrial <- df09 %>%
#Keep rows where all the values are greater than 1
filter(across(where(is.numeric), ~. >= 1)) %>%
#Rows where any value is higher than 8 times CD44low_rep
#Or 4 times less than CD44low_rep
filter(Reduce(`|`, across(CD44hi_CD69low_rep:CD44hi_CD69hi_CD103hi_rep,
~. >= CD44low_rep*8 | . <= CD44low_rep/4)))
values <- c('S100a10', 'Esm1', 'Itgb1', 'Anxa2', 'Hist1h1b',
'Il2rb', 'Lgals1', 'Mki67', 'Rora', 'S100a4',
'S100a6', 'Adam8', 'Areg', 'Bcl2l1', 'Calca',
'Capg', 'Ccr2', 'Cd44', 'Csda', 'Ehd1',
'Id2', 'Il10', 'Il1rl1', 'Il2ra', 'Lmna',
'Maf', 'Penk', 'Podnl1', 'Tiam1', 'Vim',
'Ern1', 'Furin', 'Ifng', 'Igfbp7', 'Il13',
'Il4', 'Il5', 'Nrp1', 'Ptprs', 'Rbpj',
'Spry1', 'Tnfsf11', 'Vdr', 'Xcl1', 'Bmpr2',
'Csf1', 'Dst', 'Foxp3', 'Itgav', 'Itgb8',
'Lamc1', 'Myo1e', 'Pmaip1', 'Prdm1', 'Ptpn5',
'Ramp1', 'Sdc4')
#Make sure the sorting won't change by using match function and reverse it to get the right order as
#shown in the original plot.
dfgll <- filtertrial %>% slice(match(rev(values), tracking_id))
dfgll
如何实现呢?
你可以尝试以下方法:
library(dplyr)
df <- read.csv('https://raw.githubusercontent.com/learnseq/learning/main/dfpilot.csv')
df %>%
#Keep rows where all the values are greater than 1
filter(across(where(is.numeric), ~. > 1)) %>%
#Rows where any value is higher than 8 times CD44low_rep
#Or 4 times less than CD44low_rep
filter(Reduce(`|`, across(CD44hi_CD69low_rep:CD44hi_CD69hi_CD103hi_rep,
~. > CD44low_rep*8 | . < CD44low_rep/4))) -> result
head(result)
# tracking_id CD44low_rep CD44hi_CD69low_rep CD44hi_CD69hi_CD103low_rep
#1 42624 1.91 6.68 17.50
#2 A930005H10Rik 9.41 4.63 1.48
#3 Actn1 42.01 21.77 1.71
#4 Adora2a 1.31 7.05 15.51
#5 Ahnak 12.09 152.43 362.87
#6 Als2cl 11.17 1.98 1.01
# CD44hi_CD69hi_CD103hi_rep
#1 22.51
#2 1.55
#3 1.22
#4 13.31
#5 299.07
#6 1.26
谢谢@RonakShah,我删除了另一个问题,但是由于某种原因输出似乎不正确,在比较了57个选定的ID之后,我只得到了21行而不是57行,我什至将>更改为> =,也将<到<=,但仍然只有21行,有输入吗?
对于您共享的数据,我得到197行,并运行与上面相同的代码。您如何获得21行?
我的确得到197,但是当我过滤我感兴趣的ID时,我只会得到21而不是57!我用在代码之后使用的代码更新了问题。@罗纳尔·沙(RonalShah)
是的,因此只有21个ID满足您提到的条件。请再次检查您的条件。您的第一个条件是仅保留所有值均大于1的行
Esm1
。最终输出中缺少的ID之一是。如果您检查它的值,df09 %>% filter(tracking_id == 'Esm1')
您会发现它CD44low_rep
是0.1740859
正确的。因此,由于它小于1,因此代码将其从列表中删除。我敢肯定,在最终输出中不存在其余ID的情况也是如此。谢谢@RonakShah,我同意,我不知道他们为什么要在准则中删除小于1的值,这可能意味着在bash而不是R中进行数据预处理。