Skip to content

Commit 181d9d1

Browse files
committed
added R specific content
1 parent a2974c4 commit 181d9d1

File tree

1 file changed

+330
-1
lines changed

1 file changed

+330
-1
lines changed

README.md

Lines changed: 330 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -821,13 +821,342 @@ For instance, to get the help associated with the function ```factor```:
821821
```
822822
Additional information is often provided together with packages on their website. Simple tutorials may be distributed as well, called *vignette*.
823823

824+
825+
**Resources:**
826+
827+
- [Quick-R website](https://www.statmethods.net/index.html)
828+
- [R-bloggers website](https://www.r-bloggers.com)
829+
824830
### Useful packages
825831

826832
- ggplot2
827833
- deplyR
828834
- tidyR
829-
- XCMS - MS import and preprocessing.
835+
- XCMS
836+
837+
## Data types
838+
839+
The basic data types available in R are:
840+
841+
- scalar
842+
- vector
843+
- character
844+
- string
845+
- matrix
846+
- array
847+
- data frame
848+
- list
849+
- factor
850+
851+
A **scalar** is a simple number. This can be an integer, a real number (single or double precision), a complex number. A scalar can be assigned to a generic variable with the following command (variable name = x):
852+
```
853+
# R
854+
855+
x <- 3.1415926535897932384626433832795028841971693993751
856+
```
857+
The symbol```<-``` represents the assignment operator. Alternatively, the symbol ```=``` can be used, but ```<-``` is preferred.
858+
A **vector** is an ordered collection of scalars.
859+
For instance, a 4-dimensional vector can be defined using the command ```c()```:
860+
```
861+
# R
862+
863+
X <- c(2, 3, 4, 5)
864+
```
865+
The elements of a vector can be selected by passing the index corresponding to the element of interest. For instance, to read the second element of a vector:
866+
```
867+
# R
868+
869+
myVector <- c(4, 5, 6, 7)
870+
secElement <- myVector[2] # 2 is the second element, so now secElement is equal to 5
871+
```
872+
The index can be also a scalar variable. The following gives the same results
873+
```
874+
# R
875+
876+
myVector <- c(4, 5, 6, 7)
877+
myIndex <- 2
878+
secElement <- myVector[myIndex] # 2 is the second element, so now secElement is equal to 5
879+
```
880+
The length of a vector is given by the command ```length```:
881+
```
882+
# R
883+
884+
myVector <- c(4, 5, 6, 7)
885+
length(myVector)
886+
```
887+
A **character** variable can be defined using the symbols ```'``` or ```"```:
888+
```
889+
# R
890+
891+
myChar <- "A"
892+
myChar <- 'A' # Identical results
893+
```
894+
A **string** is a sequence of characters:
895+
```
896+
myString <- 'Hello, World!'
897+
```
898+
Characters cannot be accessed by their index, like for vectors. Specific functions are available to work with strings.
899+
Note that a string is considered a 1-element object, differently from a character vector that instead is a collection of *n* character variables.
900+
The length of a string is given by the command ```char```:
901+
```
902+
# R
903+
904+
myString <- 'Hello, World!'
905+
nchar(myString)
906+
```
907+
908+
A **factor** is a special vector of labelled elements. Usually its elements are discrete and can be either strings or scalars:
909+
```
910+
# R
911+
912+
myFactor <- factor(c('This', 'is', 'my', 'factor'))
913+
myFactor2 <- factor(c('Y', 'Y', 'Y', 'N', 'N', 'Y'))
914+
myFactor3 <- factor(c(2, 2, 4, 4, 1, 2, 1, 3))
915+
```
916+
As it can be noticed, a factor vector is generated by passing a vector to the function ```factor```.
917+
918+
A numeric **matrix** can be defined by the command ```matrix```. The first argument of this function is the full list of values that will be used as matrix elements (column-by-column). The second and third arguments represent the number of rows and columns, respectively. Obviously, the number of elements must be equal to the product of the matrix dimensions. For instance, a randomly sampled vector of 20 scalars can be used to fill a 4x5 matrix:
919+
```
920+
# R
921+
922+
matElements <- sample(20)
923+
Xmat <- matrix(matElements, 4, 5)
924+
```
925+
The dimensions of a matrix are given by the following commands:
926+
```
927+
# R
928+
929+
# Define a matrix
930+
matElements <- sample(20)
931+
Xmat <- matrix(matElements, 4, 5)
932+
933+
# Number of rows
934+
nrow(Xmat)
935+
# Number of columns
936+
ncol(Xmat)
937+
# Both
938+
dim(Xmat)
939+
```
940+
Matrix dimensions can be named, using the commands `dimnames`, `rownames`, or `colnames`.
941+
Names can be assigned also at the definition time:
942+
```
943+
# R
944+
945+
Xmat <- matrix(sample(20), 4, 5)
946+
947+
# Assign the row names
948+
rownames(Xmat) <- c(1:4)
949+
950+
# Assign the column names
951+
colnames(Xmat) <- c(1:5)
952+
953+
# Read the row names and column names
954+
rownames(Xmat)
955+
colnames(Xmat)
956+
957+
# Assign using dimnames
958+
dimnames(Xmat) <- list(c(1:4), c(1:5)) # Notice that in this case we need a list
959+
960+
# Assign at the definition
961+
Xmat <- matrix(sample(20), 4, 5, dimnames = list(c(1:4), c(1:5)) # Same as dimnames command
962+
```
963+
964+
An **array** is the matrix extension to more than 2-dimensions. For instance, the following command will assign a 3-dimensional array of dimensions (5 x 6 x 10) to the variable ```myArray```:
965+
```
966+
# R
967+
968+
myArray <- array(sample(300), c(5, 6, 10)
969+
```
970+
Elements of arrays can be accessed in the similar fashion of vectors and matrices:
971+
```
972+
# R
973+
974+
myElement <- myArray[1, 3, 2] # myElement correspond to the element (1, 3, 2) of myArray
975+
```
976+
A **list** is a more complex data structure. It can be seen as a vector, whose elements can be of different types or dimensions. For instance, a list containing a vector and a matrix can be defined as follows:
977+
```
978+
# R
979+
980+
# Element-by-element assignment
981+
myList <- list() # Empty list
982+
myList[[1]] <- sample(20) # vector
983+
myList[[2]] <- matrix(sample(100, 20), 4, 5) # matrix
984+
985+
# Direct assignment: the first element will be named 'myVector',
986+
# and the second element 'myMatrix'
987+
myList <- list(myVector = sample(20),
988+
myMatrix = matrix(sample(100, 20), 4, 5))
989+
```
990+
The elements of a list can be accessed by passing their index or the name, as defined in the list. Using the previous example:
991+
```
992+
# R
993+
994+
X <- myList[[1]] # X is now equal to myVector NOTE: [[ ]] instead of [ ]
995+
X <- myList$myVector # Access by name through the operator $
996+
```
997+
998+
Finally, a **data frame** is a matrix-like structure (columns of same length), whose columns can be vectors of different data type. For instance a char and a numeric vector can be joined to form a data frame:
999+
```
1000+
# R
1001+
1002+
myCharVector <- c('A', 'B', 'C', 'D')
1003+
myNumVector <- c(1, 2, 3, 4)
1004+
myDataFrame <- data.frame(Letters = myCharVector, Numbers = myNumVector)
1005+
```
1006+
1007+
## Indexing
1008+
As seen in the previous section, elements of vectors, arrays, etc. can be accessed by their indices.
1009+
Single elements can be accessed by the value of their index (also represented by an integer variable). However, also multiple elements can be accessed, using the following commands
1010+
```
1011+
# R
1012+
1013+
# Define a matrix
1014+
myMatrix <- matrix(sample(30), 5, 6)
1015+
1016+
# Read the 4th row
1017+
myMatrix[4, ]
1018+
1019+
# Read the 2nd column
1020+
myMatrix[, 2]
1021+
1022+
# Read the first 3 elements of the 4th column
1023+
myMatrix[1:3, 4]
1024+
```
1025+
The symbol ```a:b``` is equivalent to ```c(a, a+1, a+2, a+3, ..., b-2, b-1, b)```.
1026+
1027+
## Functions
1028+
1029+
Repeated operations can be assembled into **functions**.
1030+
Functions are often exported by packages, or can be defined by the user.
1031+
User-defined functions follow the structure:
1032+
```
1033+
# R
1034+
1035+
myFunction <- function(argument1, argument2, ...) {
1036+
# Operations go here
1037+
...
1038+
return(returnValue)
1039+
}
1040+
```
1041+
Therefore, the function can be called through its name
1042+
```
1043+
# R
1044+
1045+
myValue <- myFunction(x, y, ...)
1046+
```
1047+
As you can notice, the function ends with the command ```return```. This defines the variable value returned by the function. This variable can be of any data type.
1048+
For instance, a function that calculates the factorial of an integer can be defined as follows:
1049+
```
1050+
# R
1051+
1052+
myFactorial <- function(n) {
1053+
1054+
# Check that the argument is integer
1055+
stopifnot(is.integer(n))
1056+
1057+
# Calculate 1 * 2 * ... * (n-1) * n
1058+
f <- 1
1059+
for (i in 2:n)
1060+
f <- f * i
1061+
1062+
# Then return the value
1063+
return(f)
1064+
}
1065+
```
1066+
Then, the factorial of an integer can be calculated calling the function:
1067+
```
1068+
# R
1069+
1070+
myFactorial(25) # Returns the value of 25!
1071+
```
1072+
1073+
**Resources:**
1074+
[Examples of builtin functions](https://www.statmethods.net/management/functions.html)
1075+
[Practice on writing R functions](https://www.datacamp.com/courses/writing-functions-in-r)
1076+
1077+
## For loops, apply, sapply, lapply
1078+
1079+
In R, repeated operations (iterations) can be modelled in different ways. The canonical *for loops* can be run in this way:
1080+
```
1081+
# R
1082+
1083+
for (iterator in firstValue:lastValue)
1084+
{
1085+
# Perform some operations
1086+
doSomething(iterator)
1087+
}
1088+
```
1089+
In this example, the third power of x can be calculate using a for loop:
1090+
```
1091+
# R
1092+
1093+
# A very inefficient power calculation (use x^3 in real life)
1094+
for (i in 1:2)
1095+
{
1096+
x <- x * x
1097+
}
1098+
```
1099+
R allows to run iterations also by the commands ```apply```, ```apply```, ```apply```.
1100+
The function ```apply``` returns the values of a function calculated on the marginal dimension of a variable (e.g. columns of a matrix). For instance, to calculate the sum of a matrix columns elements
1101+
```
1102+
# R
1103+
1104+
apply(myMatrix, 2, sum) # 2 defines the calculation over columns (1 for rows)
1105+
```
1106+
If we want to apply more complex operations, we can define a function on the elements
1107+
```
1108+
# R
1109+
1110+
# Calculate the sum of squares of columns elements
1111+
apply(myMatrix, 2, function(x) sum(x^2))
1112+
```
1113+
The functions ```sapply``` and ```lapply``` have a similar behaviour but they are applied to vectors and lists, respectively
1114+
```
1115+
# R
1116+
1117+
# Use sapply to avoid a for loop. Calculate the square of an array elements
1118+
myVector <- c(4, 2, 10)
1119+
sapply(1:length(myVector), function(x) x^2)
1120+
1121+
# Example of lapply
1122+
1123+
myList <- list(x = 'my', y = 'list', z = 'is', w = 'cool')
1124+
# Calculate the number of characters of each element of myList
1125+
lapply(myList, length) # This will give a list with 4 numbers: 2, 4, 2, 4
1126+
```
1127+
**Resources:**
1128+
[Using apply, sapply, lapply in R](https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/)
1129+
1130+
## If-else
1131+
As all the other languages, also R has operators for conditional statement (if-else):
1132+
```
1133+
# R
1134+
1135+
if (conditionIsTrue)
1136+
{
1137+
doSomething
1138+
} else
1139+
{
1140+
doSomethingElse
1141+
}
1142+
```
1143+
Basic logical operators are
1144+
```
1145+
# R
1146+
1147+
x == y # Returns TRUE if x is identical to y
1148+
x != y # Returns TRUE if x is not identical to y
1149+
x <- TRUE # x contains the logical value TRUE
1150+
x <- FALSE # x contains the logical value FALSE
1151+
!x # Only if x is a logical variable, returns its negation
1152+
x && y # Logical AND
1153+
x || y # Logical OR
1154+
```
8301155

1156+
## Plotting
1157+
Beside the builtin functions, there are several packages designed to produce high quality graphs. Probably, the most famous among these is ```ggplot2```.
1158+
Here, it is possible to find nice examples of data graphs generated using ```ggplot2```
1159+
[R Graphs](http://www.cookbook-r.com/Graphs/)
8311160
# Python
8321161
"Python is an interpreted high-level programming language for general-purpose programming."
8331162
- [Python Website](https://www.python.org/)

0 commit comments

Comments
 (0)