# Difference between assignment operators in R

**The blog of Kun Ren**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For R beginners, the first operator they use is probably the *assignment operator* `<-`

. Google's R Style Guide suggests the usage of `<-`

rather than `=`

even though the equal sign is also allowed in R to do exactly the same thing when we assign a value to a variable. However, you might feel inconvenient because you need to type two characters to represent one symbol, which is different from many other programming languages.

As a result, many users ask *Why we should use <- as the assignment operator?*

Here I provide a simple explanation to the subtle difference between `<-`

and `=`

in R.

First, let's look at an example.

x <- rnorm(100) y <- 2*x + rnorm(100) lm(formula=y~x)

The above code uses both `<-`

and `=`

symbols, but the work they do are different. `<-`

in the first two lines are used as **assignment operator** while `=`

in the third line does not serves as assignment operator but an operator that specifies a named parameter `formula`

for `lm`

function.

In other words, `<-`

evaluates the the expression on its right side (`rnorm(100)`

) and assign the evaluated value to the symbol (variable) on the left side (`x`

) in the current environment. `=`

evaluates the expression on its right side (`y~x`

) and set the evaluated value to the parameter of the name specified on the left side (`formula`

) for a certain function.

We know that `<-`

and `=`

are perfectly equivalent when they are used as assignment operators.

Therefore, the above code is equivalent to the following code:

x = rnorm(100) y = 2*x + rnorm(100) lm(formula=y~x)

Here, we only use `=`

but for two different purposes: in the first and second lines we use `=`

as assignment operator and in the third line we use `=`

as a specifier of named parameter.

Now let's see what happens if we change all `=`

symbols to `<-`

.

x <- rnorm(100) y <- 2*x + rnorm(100) lm(formula <- y~x)

If you run this code, you will find that the output are similar. But if you inspect the environment, you will observe the difference: a new variable `formula`

is defined in the environment whose value is `y~x`

. So what happens?

Actually, in the third line, two things happened: First, we introduce a new symbol (variable) `formula`

to the environment and assign it a formula-typed value `y~x`

. Then, the value of `formula`

is provided to the **first paramter** of function `lm`

rather than, accurately speaking, to the **parameter named formula**, although this time they mean the identical parameter of the function.

To test it, we conduct an experiment. This time we first prepare the data.

x <- rnorm(100) y <- 2*x+rnorm(100) z <- 3*x+rnorm(100) data <- data.frame(z,x,y) rm(x,y,z)

Basically, we just did similar things as before except that we store all vectors in a data frame and clear those numeric vectors from the environment. We know that `lm`

function accepts a data frame as the data source when a formula is specified.

Standard usage:

lm(formula=z~x+y,data=data)

Working alternative where two named parameters are reordered:

lm(data=data,formula=z~x+y)

Working alternative with side effects that two new variable are defined:

lm(formula <- z~x+y, data <- data)

Nonworking example:

lm(data <- data, formula <- z~x+y)

The reason is exactly what I mentioned previously. We reassign `data`

to `data`

and give its value to the first argument (`formula`

) of `lm`

which only accepts a formula-typed value. We also try to assign `z~x+y`

to a new variable `formula`

and give it to the second argument (`data`

) of `lm`

which only accepts a data frame-typed value. Both types of the parameter we provide to `lm`

are wrong, so we receive the message:

Error in as.data.frame.default(data) : cannot coerce class ""formula"" to a data.frame

From the above examples and experiments, the bottom line gets clear: to reduce ambiguity, we should use either `<-`

or `=`

as assignment operator, and only use `=`

as named-parameter specifier for functions.

In conclusion, for better readability of R code, I suggest that we only use `<-`

for assignment and `=`

for specifying named parameters.

**leave a comment**for the author, please follow the link and comment on their blog:

**The blog of Kun Ren**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.