My dataset is very strange. When I create the Seurat object and load the metadata for it, all of the values in the nCount_RNA are decimal values instead of integers. How should I interpret this? Is there an issue with the data itself or something I can do to work around this? I ask because later on in my analysis, the functions can't seem to find the nCount_RNA object, and I believe the decimal values are the reason why.
Here is the code I used to create this object:
#Loading in the data ----------------------------------------------------------
filePaths = getGEOSuppFiles("GSE124395")
tarF <- list.files(path = "./GSE124395/", pattern = "*.tar", full.names = TRUE)
untar(tarF, exdir = "./GSE124395/")
gzipF <- list.files(path = "./GSE124395/", pattern = "*.gz", full.names = TRUE)
ldply(.data = gzipF, .fun = gunzip)
# Creating the matrix -----------------------------------------------------------
P301_3_matrix <- read.delim(file = './GSE124395//GSM3531672_P301_3_CRYOMIXED11.coutt.csv')
P301_3_matrix <- data.frame(P301_3_matrix[,-1], row.names=P301_3_matrix[,1])
P301_3_matrix <- as.matrix(P301_3_matrix) #<- makes the excel file into a matrix
P301_3_colname <- read.table(file = './GSE124395//GSE124395_celseq_barcodes.192.txt', header = FALSE, row.names = 1)
P301_3_colname <- data.frame(P301_3_colname[,-1], col=P301_3_colname[,1])
P301_3_colname <- as.matrix(P301_3_colname)
colnames(P301_3_matrix) <- P301_3_colname[,1]
colnames(P301_3_matrix) <- paste(colnames(P301_3_matrix), "CryoMixed11", sep = "_")
P301_3_pdat <- data.frame("samples" = colnames(P301_3_matrix), "treatment" = "CryoMixed")
#Creating the Seurat object ----------------------------------------------------
sobj<- CreateSeuratObject(counts = P301_3_matrix, min.cells = 0, min.features=1, project = "Liver_Cell_Atlas")
sobj <- saveRDS(sobj,file="JoinedMatrixNoFilters.rds")
Hopefully this isn't too vague, and thanks for reading!
The file you read in, it is normalized somehow, and is definitely not the count data:
P301_3_matrix = read.delim('GSM3531672_P301_3_CRYOMIXED11.coutt.csv.gz',row.names=1)
head(colSums(P301_3_matrix))
X1 X2 X3 X4 X5 X6
205.2744 22457.6142 1232.4626 14193.6406 15372.4642 18808.8838
If you read the details on processing, it writes:
Based on binomial statistics, the number of observed UMIs were converted into transcript counts (Grün et al., 2014)
So most likely you have to ask the authors for the count table or simply convert to integer, proceed and hope nothing goes wrong.
Thank you so much for your reply!