An object is something you are working with in R. For example:
pi
## [1] 3.141593
R can be used as a calculator
3+3
## [1] 6
3*3
## [1] 9
As in calculators, you may store some numbers in memory You may give names to those objects, ALWAYS STARTING WITH A LETTER!.
side=3
side
## [1] 3
R is case-sensitive for object names
Side=8
side
## [1] 3
Side
## [1] 8
And names always must begin with a letter (not a number) and avoid the symbol "#", because this is used to include comments:
1side=1
side#1=1
#This is a comment, something just for you, that R will not try to interpret
## Error: <text>:1:2: unexpected symbol
## 1: 1side
## ^
Giving a new value will remove the previous one:
Side=5
Side
## [1] 5
You may now make calculations involving objects stored in memory. Calculate the area of the square:
side*side
## [1] 9
Doing that we are seeing the result, but we are not storing in memory. For that, we must create an object:
area=side*side
area
## [1] 9
Objects can be defined using the "=" symbol
side1 = 3
side2 = 4
But... image you want both sides being equal to 3:
side1 = side2
The usual symbol make assignments is an arrow ('->' or ' <- ')
side1 <- 3
side2 <- 4
side1 -> side2
R is quite robust to "unexpected spaces" in code but be careful with arrows! Here you will mean quite different things.
a <- 3
a<5
## [1] TRUE
a-5
## [1] -2
a< -5
## [1] FALSE
a<(-5)
## [1] FALSE
a <- 5
The same occurs with other few symbols (such as !=, ==,...)
Try using parentheses if you are not sure on the input meaning. Imagine you want 8 divided by 4 times 2 and summing 1 to that result:
f1 <- 1+8/4*2
f2 <- 1+8/(4*2)
f3 <- (1+8)/4*2
f1
## [1] 5
f2
## [1] 2
f3
## [1] 4.5
Appart from that, R is quite flexible accepting inputs...
3*-3
## [1] -9
3 * -3
## [1] -9
3* -3
## [1] -9
3*- 3
## [1] -9
All the objects we have defined until now are numbers. You can see the names of all the objects in memory using the function ls() like this:
ls()
## [1] "a" "area" "f1" "f2" "f3" "FILES" "i" "side" "Side"
## [10] "side1" "side2"
====== NOTE =======
We will see latter that functions are a particular type of objects. To be used, you must call the function by its name and then put the value for some arguments within brackets like this:
funtion(argument1=value1, argument2=value2,...)
In some cases you HAVE TO set the value for an argument to run the function. In other cases, a default value is assumed if not provided by user. There are few function that may run without specifying any argument. This is the case of ls() ===================
So, we have defined only numeric variables so far. This variables and belong to the "numeric class". You can get information on mode, class and storage as follows:
mode(f1) # determines the mode (object structure)
## [1] "numeric"
class(f1) # determines the class (object interaction with functions)
## [1] "numeric"
typeof(f1) # determines the R internal type of storage for the object
## [1] "double"
str(f1) # provides the internal structure (more like a summary)
## num 5
Or you can ask R about specific classes using is.XXXX()
is.numeric(f1)
## [1] TRUE
is.double(f1)
## [1] TRUE
To interact with data stored in memory, R provides different specialized data structures called "objects" that are referred to and manipulated through symbols or variables. There are different hierarchical types depending on the structure, interaction with functions and storage: MODE = object structure CLASS = object interaction with functions TYPE = R internal of storage for the object
MODES (basic structure) 1. ATOMIC OBJECTS 1.1 NUMERIC 1.2 COMPLEX 1.3 CHARACTER OR STRING 1.4 LOGICAL 1.5 RAW 2. RECURSIVE OBJECTS 2.1 LIST 2.2 FUNCTION 2.3 (...)
Defined by object interaction with functions. Can be basic or virtual: BASIC "character" "complex" "double" "expression" "integer" "list" "logical" "numeric" "single" "raw" "NULL" "function" "externalptr" "ANY" "VIRTUAL" "missing" "namedList" VIRTUAL (extended by all the above) "vector" "S4" "language" "function" "call"
Are "numbers" and accept the following arithmentic operations:
Arithmetic Operators:
### + addition
### - subtraction
### * multiplication
### / division
### ^ or ** exponentiation
### x %% y modulus (remainder after division) 5%%2 is 1
### x %/% y integer division 5%/%2 is 2
Thats why you may use R as a calculator, and create numeric objects:
f1 <- 3*(2-1)
You can get information on mode, class and save this info into objects as follows:
mode(f1) # determines the mode (object structure)
## [1] "numeric"
class(f1) # determines the class (object interaction with functions)
## [1] "numeric"
typeof(f1) # determines the R internal type of storage for the object
## [1] "double"
str(f1) # provides the internal structure[class] (more like a summary)
## num 3
There are multiple classes that are grouped together as "numeric" modes, the 2 most common of which are double (for double precision floating point numbers) and integer.
Equivalent of a C "double". A simple way of representing any numeric value is using two numbers in a way similar to the scientific notation: significantbase^exponent Thus, 12.345 is 1234510^(-3) and because base is common for all numers in a given computer only two numbers are required (12345 and -3)=> "double"
If the numeric value has no decimal part, it can be stored as "integer" The exponent is 0 for integers, so it is not neccessary to be stored and requires less space in memory. However, all numbers are stored as double by default, even if they are integer values:
typeof(3)
## [1] "double"
typeof(as.integer(3))
## [1] "integer"
R will automatically convert between the numeric classes when needed, so you will not be worried about if the number 3 is currently stored as an integer or as a double. Most maths is done using double precision, so "double" is the default storage.
However, if you want to specifically store a numeric as integer you can do it:
f2 <- as.integer(f1)
mode(f2) # determines the mode (object structure)
## [1] "numeric"
class(f2) # determines the class (object interaction with functions)
## [1] "integer"
typeof(f2) # determines the R internal type of storage for the object
## [1] "integer"
str(f2) # provides the internal structure[class] (more like a summary)
## int 3
It looks the same for you, but it is different for R:
is.integer(f1)
## [1] FALSE
is.integer(f2)
## [1] TRUE
mode(f1)
## [1] "numeric"
mode(f2)
## [1] "numeric"
class(f1)
## [1] "numeric"
class(f2)
## [1] "integer"
Forcing doble/integer types is useful if you call some C or Fortran codes. For a more conventional use, you will not require controling storage options.
Remember that you can ask R about specific classes or force class changes using is.XXX or as.XXX:
is.numeric(f1)
## [1] TRUE
is.double(f1)
## [1] TRUE
as.numeric(f1)
## [1] 3
as.double(f1)
## [1] 3
R can also work using letters instead of numbers. Let's create an object containing the letter "B"
letterB <- B
## Error in eval(expr, envir, enclos): object 'B' not found
It doesn't work because R is looking for an object in memory stored with the symbol B. Because it doesn't exists, it yields an error. To create a character (or string) it must be defined within quotes (" or ').
letterB <- "B"
letterB
## [1] "B"
In that case, R will not try to look for B in memory, but provides exactly what you typed within quotes.
sentence1 <- "This R session bores me! @#%&!!"
sentence1
## [1] "This R session bores me! @#%&!!"
str(sentence1)
## chr "This R session bores me! @#%&!!"
mode(sentence1) # determines the mode (object structure)
## [1] "character"
class(sentence1) # determines the class (object interaction with functions)
## [1] "character"
typeof(sentence1) # determines the R internal type of storage for the object
## [1] "character"
There are few 'specieal strings':
LETTERS # Provides uppercase letters
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
letters # Provides lowercase letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
month.name # Provides months
## [1] "January" "February" "March" "April" "May" "June"
## [7] "July" "August" "September" "October" "November" "December"
month.abb # Provides months in short
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
You can also store a number as a character:
letter3 <- "3"
number3 <- 3
They are different classes...
class(letter3)
## [1] "character"
class(number3)
## [1] "numeric"
# ...and accept different functions:
number3*3
## [1] 9
letter3*3
## Error in letter3 * 3: non-numeric argument to binary operator
Are objects that can be expressed in terms of TRUE or FALSE (T and F are also accepted). As occurr with numerics, they have specific operators to perform operations:
Logical Operators
### < less than
### <= less than or equal to
### > greater than
### >= greater than or equal to
### == exactly equal to
### != not equal to
### !x Not x (e.g., !TRUE is FALSE)
### x | y x OR y
### x & y x AND y
Important! One = to define, two to compare!!!
f1==2
## [1] FALSE
f1=2
A logical object
l1 <- f1==2
mode(l1) # determines the mode (object structure)
## [1] "logical"
class(l1) # determines the class (object interaction with functions)
## [1] "logical"
typeof(l1) # determines the R internal type of storage for the object
## [1] "logical"
str(l1) # provides the internal structure (more like a summary)
## logi TRUE
Some examples of operations
f1>f2
## [1] FALSE
f1>100
## [1] FALSE
"1"!=1
## [1] FALSE
double(1.0) == 1
## [1] FALSE
!TRUE
## [1] FALSE
6<=3
## [1] FALSE
45>40
## [1] TRUE
You can join several logicals into one using AND (&). In this case you will get a TRUE only if all the elements are TRUE. You can also join using OR (|). This provides a TRUE if at least one of the elements are TRUE
6<=3|45>40
## [1] TRUE
3<=6&45>40
## [1] TRUE
This is useful for subseting large datasets (for example getting data from individual 23 on trait "height".
Factors are used to describe items that can have a finite number of discrete values or categories (gender, country, month,...) called "levels".
factor3 <- factor(3)
factor3*3
## Warning in Ops.factor(factor3, 3): '*' not meaningful for factors
## [1] NA
is.numeric(factor3)
## [1] FALSE
str(factor3)
## Factor w/ 1 level "3": 1
mode(factor3) # determines the mode (object structure)
## [1] "numeric"
class(factor3) # determines the class (object interaction with functions)
## [1] "factor"
typeof(factor3) # determines the R internal type of storage for the object
## [1] "integer"
factorB <- factor("B")
str(factorB)
## Factor w/ 1 level "B": 1
mode(factorB) # determines the mode (object structure)
## [1] "numeric"
class(factorB) # determines the class (object interaction with functions)
## [1] "factor"
typeof(factorB) # determines the R internal type of storage for the object
## [1] "integer"
A factor is always of numeric mode!! This is important to undestand the ouput of as.XX over factors:
as.character(factor3)
## [1] "3"
as.numeric(factor3)
## [1] 1
as.numeric(as.character(factor(3)))
## [1] 3
as.numeric(FALSE)
## [1] 0
as.numeric("FALSE")
## Warning: NAs introduced by coercion
## [1] NA
as.numeric(as.logical("FALSE"))
## [1] 0
as.logical("hello")
## [1] NA
as.logical(0)
## [1] FALSE
as.logical(1)
## [1] TRUE
as.logical(7.8)
## [1] TRUE
as.logical(-7.8)
## [1] TRUE
as.factor(LETTERS)
## [1] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
## Levels: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number)
Just to show that there are many cases of modes. Some of them you will never use (did you remember complex numbers??)
comp1 <- 2+3i
str(comp1)
## cplx 2+3i
mode(comp1) # determines the mode (object structure)
## [1] "complex"
class(comp1) # determines the class (object interaction with functions)
## [1] "complex"
typeof(comp1) # determines the R internal type of storage for the object
## [1] "complex"
is.numeric(comp1)
## [1] FALSE
is.complex(comp1)
## [1] TRUE
Remember that there are other "special" letters:
pi # (we will see another below, letters, months...)
## [1] 3.141593
e # that is not the Euler's number. For 2.7182... you must:
## Error in eval(expr, envir, enclos): object 'e' not found
exp(1)
## [1] 2.718282
Let's define some complex numbers using these values
complex1 <- pi+3i
complex2 <- 3+pii # It doesn't work because R is looking for an object called "pii"
## Error in eval(expr, envir, enclos): object 'pii' not found
Use the help!
?complex
In that case you should use the "complex" function:
complex2 <- complex(real=3,imaginary=pi)
Let's make the Euler identity (e^(i*pi)-1=0)
exp(complex(real=0,imaginary=pi))+1
## [1] 0+1.224647e-16i
And you will not get 0 but a very very small number. The reason is that computers works in binary. The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html
a <- sqrt(2)
a*a==2
## [1] FALSE
# To solve this problem yo can ask for comparing two values
all.equal(a*a,2)
## [1] TRUE
# Sometimes, though, these errors accumulate:
j <- 0.1
x <- j+j+j+j+j+j+j+j+j+j
x-1
## [1] -1.110223e-16