Diary of R: Learning R with lung cancer dataset

In the first paper the protein expression data was used to look for global patterns with tumor expression versus normal tissue. But now we have additional data for genetic sequencing and can look for more specific differences in patterns rather than population patterns or identifying potential biomarkers. In the first paper much of the data was analyzed as a ratio- T/D. But here we can look for a normalization factor closer to a housekeeping gene. IGF2R (X3482) was selected for the expression level was close to the mean of the RFU and had the smallest standard deviation. There is some evidence suggesting IGF1R may be important in cancer signalling networks, but IGF2R binds more specifically to IGF2 and lacks an intracellular kinase binding domain making it less likely to participate in the signalling cascade. But little data is available to suggest it is expressed in relatively similar amounts in tumor and normal tissue. GAPDH is present in our dataset but is at the top of expression range making it a poor choice for a normalization factor.

Then I looked at the log2 of the normalized data. I selected proteins that showed variability between tumor and distant, as well as the varibility between patients. This narrowed the pool from 850 to ~130. Now I want to plot the selected data.

I loaded the gplots package to use both Venn diagrams and heatmaps. There is a vignette for making Venn diagrams. I will have to explore it. And I want to use some hierarchical clustering for the heatmaps. But it does not look like the standard heatmap can do that part.

Here is a cute description of mapping using the packages
http://mannheimiagoesprogramming.blogspot.com/2012/06/drawing-heatmaps-in-r-with-heatmap2.html

But this description is more useful for clustering approaches (plus R code)
https://www.biostars.org/p/14156/

Diary of R

Wednesday, January 21, 2015

Learning R with lung cancer dataset

No comments:

Post a Comment