colnames(somaTD)<-c("ID","1D","1T","2D","2T","3D","3T","4D","4T","5D","5T","6D","6T","7D","7T","8D","8T")
df2<- df1[c(3:859), ]
So I will clean up the entire dataset. First I will need to eliminate teh duplicate values. Which is easier said than done. I can find how many duplicates there are based on the gene Id using this:
duplicated(somaTD[,1])and make a list of them using this
somaTD[duplicated(somaTD[,1]),1]But in reality I have to decide which duplicates to keep in the set or rename when possible. So I did this manually in excel. I reimported the data and want to select only T and D like this:
somaTD <- subset(somdataT, select= c(2,6,7,9,10,12,13,15,16,18,19,21,22,24,25,27,28))
somaTD <- somaTD[2:859]
Now i have a complete, clean and non-duplicated set. I can make a set with background values eliminated by selecting only those with mean above 1 std dev above background. Make matrix with row.means added.
df7.means<-rowMeans(df7)#eliminated the following genes-
df7.m<-cbind(df7,df7.means)
df8 <- subset(df7.m, df7.means>450)
df8.bkgd <- subset(df7.m, df7.means<450)
Next up is learning the mathmatical functions to create ratios and do normalization to a single gene.

