Saturday, October 22, 2011

Really handy R knowledge

R. It's big, it's great, it's horribly under-documented.

If you don't want to pay for a decidedly Ctrl+F lacking book, and you've tried to self-teach R, chances are you've been very very frustrated at least once.

Here, I'm literally just going to dump handy R-knowledge that was not as easy to find as it should have been.

Jan 25th 2015 note: As I have recently returned to using R after a break, I'm revisiting and updating this list. I'm much less peculiar today than I was when I first wrote this post, so I hope you'll forgive the comparatively subdued tone. I think I picked a very strange list of 'handy' functions.

1. Double brackets vs. single brackets

The first headache-cure you learn. Single brackets returns an object of the datatype of the thing being indexed into, double brackets returns the datatype of the actual thing! The one you want!

example:
> moo = list("v1" = c(1,2,3),"v2" = c(2,3,4))
> moo
$v1
[1] 1 2 3
$v2
[1] 2 3 4
> moo[1]
$v1
[1] 1 2 3
# ^ That was a list! asdfawefsadf
> moo[[1]]
[1] 1 2 3
# ^ That was a vector! yay!

2. Unlist

Sheesh. For solving the same headache as above.
> unlist(moo)
v11 v12 v13 v21 v22 v23
1 2 3 2 3 4
And THAT, children, is how to convert a list into a vector.

3. Append
Can also be used to shove items into a vector at places other than the end.
> hi = append(hi,4)
> hi = append(hi,-1,0)
> hi
[1] -1 1 2 3 4
> hi = append(hi,0,1)
> hi
[1] -1 0 1 2 3 4

4. Cbind
Lets you turn a row vector into a column vector. Handy if you're trying to build up a data frame. Oh god data frames....ok, one thing at a time.
Using hi from the previous example:
> cbind(hi)
hi
[1,] -1
[2,] 0
[3,] 1
[4,] 2
[5,] 3
[6,] 4

Jan 25th 2015: yes, let's talk about those data frames.
4.1. Reading a table into a data frame
> x <- read.table("/path/to/file/fun/fact/R/tab/completes.txt", header=TRUE, as.is=TRUE)
The as.is=TRUE tells R to treat strings as actual strings, rather than as these peculiar things called "factors". I don't know who thought it would be a good idea to default to reading strings as factors. Maybe they enjoy being mean to novices. It sounds like factors could in theory be useful for filtering but in bioinformatics your strings are usually things like gene names or chromosomal coordinates and very rarely a filtering keyword...

4. Subset
Really handy tool. Right now, I have on my workspace a data frame called "correlations". It has 124 rows, and, uh, I want to say 264 columns but I believe R thinks of them differently.
> length(names(correlations))
[1] 264
> length(row.names(correlations))
[1] 124
To apply a selection condition, the first argument is the data frame and the second argument is a condition involving the data frame. The syntax is:
> hi = subset(correlations,correlations[1] > 0.5)
> length(row.names(hi))
[1] 28
here, I selected for all rows where the value in the first column was > 0.5. Another way to do that would have been to use the name of the first column:
> names(correlations[1])
[1] "AHR"
> hi2 = subset(correlations,AHR > 0.5)
> length(row.names(hi2))
[1] 28
Incidentally, while in the example I used the column correlations[1] and placed a logical condition on it, I could have technically used any column vector containing 124 boolean values as my "condition", and the syntax would have worked:
> coolvector = correlations[1] > 0.5
> coolvector[0]#this gives me information about the type of object
logical(0)
> length(coolvector)
[1] 124
> hi = subset(correlations,coolvector)
> length(row.names(hi))
[1] 28
Also, if you have row names, the syntax for using row names to extract information about a particular row from a data frame looks like this (here, I have a row named "AP2_Q6" in the data frame hi2, and I'm accessing the first column in that row):
> hi2["AP2_Q6",1]
[1] 0.7946332

5. eval

Handy, handy, handy for making your code flexible. Lets you interpret a string as a command. Let me demonstrate:
> cutecondition = "happycolumn <- correlations$AHR > 0.5"
> eval(parse(text=cutecondition))
> hi = subset(correlations,happycolumn)
> length(row.names(hi))
[1] 28

Yay! I feel like I've done a service to humanity! Now I need to get back to work.

Peace love and bunnies!
- Av