R in 5 Minutes

Application-based Learning
R
Author

Max Sands

Published

November 23, 2022

Intro

The purpose of this article is 3-fold:

  1. to demonstrate the basics of R as concisely as possible so that you can get up and running on your own projects, even if you’ve had no exposure to coding.

  2. to act as a basic guide for the non-technical readers interested in following my Research Articles at a more granular level.

  3. to familiarize myself with the process of writing and explaining topics before I publish my research (and to make sure that my website is working…)

Quick Note

I would quickly like to explain my background and why I think it is important to have a basic knowledge of ‘coding’:

I am a Business & Investment Analyst, and 9 months ago I had absolutely no knowledge of ‘coding’; my technical ability was comparable to that of your average dog. I can now tell you 9 months in that understanding the basics of ‘coding’ goes a very long way.

Firstly, as long as you do a task correctly the first time in code, you can then automate away that task (and its different variations). Whether its performing the same calculations in an Excel file that your boss sends you every morning, or publishing your company’s quarterly financial statements, the same principle applies.

Secondly, we are living in a world where data is everywhere, and the ability to code allows one to dig into the data and draw valuable insights from it. For anyone in an analytical position (whether Financial Analyst, Medical Researcher, or CEO), this is extremely important and allows you to stand on the shoulders of giants.

Thirdly, you can leverage tools that others have built. There is so much free code on the web and someone else may have already built a tool or completed a task that you are trying to do. This is extremely helpful.

Lastly, a word of caution: coding is not everything. You can be the world’s greatest coder, but if you lack the ability to build a logical, easily-explainable narrative from data, then your value is limited to the tools that you can build for others. In other words, true value comes from the ability to not only work with data, but also derive meaning from it and think originally.

Ok, that’s all; let’s get into it!

Learning R

Before you can use R, you need to install it along with RStudio on your computer. Next, run install.packages("tidyverse"). The tidyverse is an R package that someone created which makes working with data easy.

Next, we need to load this package by running library(tidyverse).

Code
library(tidyverse)

You are all set - now we can begin.

The Basics of Data

Data is simply a spreadsheet of values, and we would like our data to be in a ‘tidy’ format.

Tidy Data

Data is considered tidy when each column represents a variable and each row consists of an observation. Consider the following dataset (and feel free to inspect the code and guess what each line means):

Code
diamonds %>% 
    head()
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Notice how this data is tidy; each column represents a variable (price, color, etc.) and each row is an observed diamond. Your goal should be to have your data in this format because it is easy to manipulate.

Gathering Data

Data is typically gathered from an API, a database, or simply an Excel/csv spreadsheet that you may have. For now, we will use a built-in R dataset called diamonds.

Manipulating Data

As long as data is in a tidy format, there are only a few actions that we need to do when manipulating data:

filter filter data according to certain conditions
summarize summarize the data (e.g. finding the average)
group group similar observations
pivot ‘pivoting’ the data in different ways
select select relevant information
mutate changing the data in some fashion

Filtering

Let’s pretend that we only want to consider diamonds with a carat greater than .7 and a depth greater than 63: (click on the “Code” section)

Code
diamonds %>% 
    filter(carat > .7 & depth > 63) %>% 
    head()
carat cut color clarity depth table price x y z
0.78 Very Good G SI2 63.8 56 2759 5.81 5.85 3.72
0.96 Fair F SI2 66.3 62 2759 6.27 5.95 4.07
0.75 Very Good D SI1 63.2 56 2760 5.80 5.75 3.65
0.91 Fair H SI2 64.4 57 2763 6.11 6.09 3.93
0.91 Fair H SI2 65.7 60 2763 6.03 5.99 3.95
0.71 Very Good D SI1 63.6 58 2764 5.64 5.68 3.60

Let’s continue to filter down and consider only the subset with a cut of “Very Good”:

Code
diamonds %>% 
    filter(carat > .7 & depth > 63) %>% 
    filter(cut == "Very Good") %>% 
    head()
carat cut color clarity depth table price x y z
0.78 Very Good G SI2 63.8 56.0 2759 5.81 5.85 3.72
0.75 Very Good D SI1 63.2 56.0 2760 5.80 5.75 3.65
0.71 Very Good D SI1 63.6 58.0 2764 5.64 5.68 3.60
0.71 Very Good G VS1 63.3 59.0 2768 5.52 5.61 3.52
0.72 Very Good G VS2 63.7 56.4 2776 5.62 5.69 3.61
0.75 Very Good D SI2 63.1 58.0 2782 5.78 5.73 3.63

You will now see that we have from our original 53,940 diamonds, we have filtered down to 1,550 that adhere to our conditions.

At this point you may have three questions:

  1. What is the %>%?

This is called a pipe and you can translate it to “and then”. It allows us to perform several operations consecutively. So if we look at the code, we first start with the diamonds dataset by typing diamonds, and then we filter according to carat and depth, and then we filter according to cut. The pipe is extremely useful and it is native to R.

  1. What does the head() function do?

It prints only the first 6 observations, that way you don’t have a table with 50,000 rows on your screen.

  1. What if I want to filter down to several different cuts, not just “Very Good”

Great question, here’s what you would do:

Code
diamonds %>% 
    filter(cut %in% c("Ideal", "Premium")) %>% 
    head()
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.23 Ideal J VS1 62.8 56 340 3.93 3.90 2.46
0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33
0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71

We tell R to filter down to the observations where cut matches one of the strings in the vector c("Ideal", "Premium"). The c() function creates a vector.

Summarizing

Let’s say we want to summarize the data and find the average diamond price, along with its standard deviation:

Code
diamonds %>% 
    summarize(avg_price = mean(price),
              st_dev    = sd(price))
avg_price st_dev
3932.8 3989.44

Notice that we can take our 50,000+ diamonds and summarize the data down to an average price…

You will notice that in the summarize function I start by naming the column I want (avg_price) and then I tell R what to do (find the mean of the price variable/column. The mean() & sd() functions calculate mean and standard deviation respectively). I could just as easily call the columns “thing1” & “thing2”:

Code
diamonds %>% 
    summarize(thing1 = mean(price),
              thing2    = sd(price))
thing1 thing2
3932.8 3989.44

Grouping

Summarizing the entire data is important, but let’s say we want to find the average diamond price within each color group…

Code
diamonds %>% 
    group_by(color) %>% 
    summarize(avg_price = mean(price)) %>% 
    ungroup()
color avg_price
D 3169.954
E 3076.752
F 3724.886
G 3999.136
H 4486.669
I 5091.875
J 5323.818

We can take things a step further and group by color and cut…

Code
diamonds %>% 
    group_by(color, cut) %>% 
    summarize(avg_price = mean(price)) %>% 
    ungroup() %>% 
    slice(1:10)
color cut avg_price
D Fair 4291.061
D Good 3405.382
D Very Good 3470.467
D Premium 3631.293
D Ideal 2629.095
E Fair 3682.312
E Good 3423.644
E Very Good 3214.652
E Premium 3538.914
E Ideal 2597.550

You will notice that we now have average price for each color and cut. I also only showed the first 10 rows of output by using the slice() function.

Pivoting

Pivoting is probably the most complicated of the broad actions I am showing you, but the previous segment allows for a great transition. I decided to show only the first 10 rows of output rather than inundate you with 35 rows, but there must be a better way of showing the output, right? I mean we have letters repeating in the color column. This would make more sense:

Code
diamonds %>% 
    group_by(color, cut) %>% 
    summarize(avg_price = mean(price)) %>% 
    ungroup() %>% 
    pivot_wider(
        names_from  = cut,
        values_from = avg_price
    )
color Fair Good Very Good Premium Ideal
D 4291.061 3405.382 3470.467 3631.293 2629.095
E 3682.312 3423.644 3214.652 3538.914 2597.550
F 3827.003 3495.750 3778.820 4324.890 3374.939
G 4239.255 4123.482 3872.754 4500.742 3720.706
H 5135.683 4276.255 4535.390 5216.707 3889.335
I 4685.446 5078.533 5255.880 5946.181 4451.970
J 4975.655 4574.173 5103.513 6294.592 4918.186

We tell R to take our 35 row table, and pivot it so that we have a color column followed by columns with the different cuts, wherein each value is the average price.

The names_from argument asks us what variable to we want to pivot on (we said ‘cut’ and therefore R took all of the cut values and made them columns). The values_from argument asks us which variable we would like to R to occupy the new columns with (we said ‘avg_price’ and therefore R occupied all of the ‘cells’ in our pivot table with the corresponding values from the avg_price column).

Quick Tip: hitting the tab key when your cursor is inside of a function’s parentheses will show all of the function’s available arguments (2 of which are names_from and values_from for the pivot_longer() function.)

Important Note: You will notice that now we have violated the premise of tidy data. The columns Fair:Ideal are not variables. They are types of “cut” (cut is the variable). For the purposes of coding, and data manipulation, we want our data to be in a tidy format. However, for the purposes of presentation, we typically want our data to be in a ‘wide’ format (hence pivot_wider).

We can do the opposite and revert our table back into a ‘long’ format with pivot_longer() :

Code
diamonds %>% 
    group_by(color, cut) %>% 
    summarize(avg_price = mean(price)) %>% 
    ungroup() %>% 
    pivot_wider(
        names_from  = cut,
        values_from = avg_price
    ) %>% 
    pivot_longer(
        cols = Fair:Ideal
    ) %>% 
    slice(1:10)
color name value
D Fair 4291.061
D Good 3405.382
D Very Good 3470.467
D Premium 3631.293
D Ideal 2629.095
E Fair 3682.312
E Good 3423.644
E Very Good 3214.652
E Premium 3538.914
E Ideal 2597.550

We can also rename the columns back to their original names within the pivot_longer() function:

Code
diamonds %>% 
    group_by(color, cut) %>% 
    summarize(avg_price = mean(price)) %>% 
    ungroup() %>% 
    pivot_wider(
        names_from  = cut,
        values_from = avg_price
    ) %>% 
    pivot_longer(
        cols      = Fair:Ideal,
        names_to  = "cut",
        values_to = "avg_price"
    ) %>% 
    slice(1:10)
color cut avg_price
D Fair 4291.061
D Good 3405.382
D Very Good 3470.467
D Premium 3631.293
D Ideal 2629.095
E Fair 3682.312
E Good 3423.644
E Very Good 3214.652
E Premium 3538.914
E Ideal 2597.550

That’s on pivoting…

Selecting

Selecting is straightforward. Here are the first 6 rows of our original dataset:

Code
diamonds %>% 
    head()
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Let’s say we are about to investigate something but we only need price, carat, and cut… then it is best practice to select those variables/columns first (imagine we have thousands of variables/columns…):

Code
diamonds %>% 
    select(price, carat, cut) %>% 
    head()
price carat cut
326 0.23 Ideal
326 0.21 Premium
327 0.23 Good
334 0.29 Premium
335 0.31 Good
336 0.24 Very Good

We can also select by omission:

Code
diamonds %>% 
    select(-x, -y, -z) %>% 
    head()
carat cut color clarity depth table price
0.23 Ideal E SI2 61.5 55 326
0.21 Premium E SI1 59.8 61 326
0.23 Good E VS1 56.9 65 327
0.29 Premium I VS2 62.4 58 334
0.31 Good J SI2 63.3 58 335
0.24 Very Good J VVS2 62.8 57 336

We can select variables carat through clarity:

Code
diamonds %>% 
    select(carat:clarity) %>% 
    head()
carat cut color clarity
0.23 Ideal E SI2
0.21 Premium E SI1
0.23 Good E VS1
0.29 Premium I VS2
0.31 Good J SI2
0.24 Very Good J VVS2

And again by omission:

Code
diamonds %>% 
    select(-carat:-clarity) %>% 
    head()
depth table price x y z
61.5 55 326 3.95 3.98 2.43
59.8 61 326 3.89 3.84 2.31
56.9 65 327 4.05 4.07 2.31
62.4 58 334 4.20 4.23 2.63
63.3 58 335 4.34 4.35 2.75
62.8 57 336 3.94 3.96 2.48

Very simple.

Mutating

What if we want to perform some sort of calculation or change the data in some way? This is the purpose of mutating…

In our dataset, we have the variables x, y, z which represent the length, width, and height of the diamond. If we pretend all the diamonds are cubes, we can calculate the cubic volume of each diamond by multiplying the dimensions of each diamond. Let’s do this:

Code
diamonds %>% 
    select(x:z) %>% 
    mutate(volume = x * y * z) %>% 
    head()
x y z volume
3.95 3.98 2.43 38.20203
3.89 3.84 2.31 34.50586
4.05 4.07 2.31 38.07688
4.20 4.23 2.63 46.72458
4.34 4.35 2.75 51.91725
3.94 3.96 2.48 38.69395

Notice how mutate() is similar in structure to summarize(); first we tell R what we would like name our new variable/column (“volume”), and then we tell R how to calculate it.

Mutate can also change a current column:

Code
diamonds %>% 
    mutate(carat = "Hello World") %>% 
    head()
carat cut color clarity depth table price x y z
Hello World Ideal E SI2 61.5 55 326 3.95 3.98 2.43
Hello World Premium E SI1 59.8 61 326 3.89 3.84 2.31
Hello World Good E VS1 56.9 65 327 4.05 4.07 2.31
Hello World Premium I VS2 62.4 58 334 4.20 4.23 2.63
Hello World Good J SI2 63.3 58 335 4.34 4.35 2.75
Hello World Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Now, all observations of carat are “Hello World”.

Basic Modeling

We will build a linear model to explain diamond prices. In R, the function to create a linear model is lm():

Code
diamonds %>% 
    lm(formula = price ~ carat) %>% 
    summary()

Call:
lm(formula = price ~ carat, data = .)

Residuals:
     Min       1Q   Median       3Q      Max 
-18585.3   -804.8    -18.9    537.4  12731.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2256.36      13.06  -172.8   <2e-16 ***
carat        7756.43      14.07   551.4   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1549 on 53938 degrees of freedom
Multiple R-squared:  0.8493,    Adjusted R-squared:  0.8493 
F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16

We just built a linear model that regressed carat on diamond price. As you can see, we can use a diamond’s caratage to explain 85% of price variation. Our model also tells us that for every 1 unit increase in caratage, diamond prices increases by $7,756 on average.

However, I’m sure you will agree that the output is not visually pleasing. Moreover, it is not easy to manipulate since it is not in a tabular format.

Let’s, once again, stand on the shoulders of giants and utilize a tool that someone else has built to clean up the output. Just like you installed tidyverse, install the broom package by running install.packages("broom"). Then, load the package by running library(broom).

Code
library(broom)

This time let’s regress price on all other variables and use the tidy() function from the broom package to tidy the output:

Code
diamonds %>% 
    lm(formula = price ~ .) %>% 
    summary() %>% 
    tidy()
term estimate std.error statistic p.value
(Intercept) 5753.761857 396.629824 14.5066294 0.0000000
carat 11256.978307 48.627509 231.4940348 0.0000000
cut.L 584.457278 22.478150 26.0011290 0.0000000
cut.Q -301.908158 17.993919 -16.7783441 0.0000000
cut.C 148.034703 15.483328 9.5609097 0.0000000
cut^4 -20.793893 12.376508 -1.6801098 0.0929418
color.L -1952.160010 17.341767 -112.5698421 0.0000000
color.Q -672.053621 15.776995 -42.5970601 0.0000000
color.C -165.282926 14.724927 -11.2247022 0.0000000
color^4 38.195186 13.526539 2.8237221 0.0047487
color^5 -95.792932 12.776114 -7.4978145 0.0000000
color^6 -48.466440 11.613917 -4.1731348 0.0000301
clarity.L 4097.431318 30.258596 135.4137965 0.0000000
clarity.Q -1925.004097 28.227228 -68.1967102 0.0000000
clarity.C 982.204550 24.151516 40.6684433 0.0000000
clarity^4 -364.918493 19.285011 -18.9223900 0.0000000
clarity^5 233.563110 15.751700 14.8278029 0.0000000
clarity^6 6.883492 13.715100 0.5018915 0.6157459
clarity^7 90.639737 12.103482 7.4887321 0.0000000
depth -63.806100 4.534554 -14.0710870 0.0000000
table -26.474085 2.911655 -9.0924516 0.0000000
x -1008.261098 32.897748 -30.6483316 0.0000000
y 9.608887 19.332896 0.4970226 0.6191751
z -50.118891 33.486301 -1.4966983 0.1344776

You will notice that I used ‘.’ to tell R ‘all other variables’ rather than type each of them out. More importantly, the output is much cleaner and easier to manipulate.

However, we cannot see the model’s accuracy. For this, we need to use the glance() function from broom:

Code
diamonds %>% 
    lm(formula = price ~ .) %>% 
    summary() %>% 
    glance()
r.squared adj.r.squared sigma statistic p.value df df.residual nobs
0.9197915 0.9197573 1130.094 26881.83 0 23 53916 53940

Now we have accuracy metrics in a nice format.

Lastly, if we would like to see the model’s fit for each observation, we can use the augment() function from broom (scroll to the right):

Code
diamonds %>% 
    lm(formula = price ~ .) %>% 
    augment() %>% 
    head()
price carat cut color clarity depth table x y z .fitted .resid .hat .sigma .cooksd .std.resid
326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43 -1346.3643 1672.3643 0.0003742 1130.082 0.0000342 1.4801217
326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31 -664.5954 990.5954 0.0004133 1130.097 0.0000132 0.8767411
327 0.23 Good E VS1 56.9 65 4.05 4.07 2.31 211.1071 115.8929 0.0009098 1130.105 0.0000004 0.1025982
334 0.29 Premium I VS2 62.4 58 4.20 4.23 2.63 -830.7372 1164.7372 0.0004062 1130.094 0.0000180 1.0308641
335 0.31 Good J SI2 63.3 58 4.34 4.35 2.75 -3459.2242 3794.2242 0.0007715 1129.987 0.0003629 3.3587358
336 0.24 Very Good J VVS2 62.8 57 3.94 3.96 2.48 -1380.4876 1716.4876 0.0007230 1130.081 0.0000696 1.5194380

The broom package is so useful because it cleans up model output, but more importantly, it can be used with many other (more complex) models.

Visualizing Data

Being able to visualize data is essential for understanding it; the famous saying “a picture is worth a thousands words” is doubly true in today’s age.

Let’s start out by plotting diamond price against caratage.

Creating a Canvas

First we need to create a canvas with the ggplot() function:

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price))

Notice that we start with the diamonds dataset and then we create a canvas with the ggplot() function. The aes() function stands for aesthetic and allows us to pick which variables/columns we want to use in our plot. In this case we tell R that we want to plot carat on the x-axis and price on the y-axis.

Adding Geoms

In our plot we would like to add dots that represent each data point. In R adding these elements are called geometries (i.e. geoms):

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price)) +
    geom_point()

Notice how when creating plots with ggplot, we can no longer use the pipe (%>%). Instead, we use a + sign to add layers to the plot.

From our plot we can tell that there is a clear positive relationship between price and caratage.

Modifying Geoms

Our plot contains so many points and it is overwhelming; let’s modify the plot so that the points are more transparent with the alpha argument of geom_point().

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price)) +
    geom_point(alpha = .15, color = "midnightblue") +
    geom_smooth()

You will notice that the points are more transparent and that we also modified their color. We also included a smoother line with geom_smooth().

Adding Aesthetics

Up to now our plot has had only 2 aesthetics (x and y). But, all of the arguments that can be passed to geoms (alpha, color, etc.) are actually aesthetics that can be passed in the main aes() function. This probably sounds confusing but the following code will make much more sense:

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth()

You will notice that instead of locally changing the color argument in the geom_point() function, we have put in the main aes() function wherein we set it equal to cut. By doing this, we are telling R that the color of each geometry should be defined by the cut variable/column.

Faceting

Our plot is overwhelming with all the different colors on one canvas so lets create a faceted canvas… Rather than explain in words, the following code should be self evident:

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~cut)

This is called a faceted plot because we have created facets according to the cut variable/column. You will note that we need to put a ~ before the specified variable; this is just how the facet_wrap function works.

We can also decide to facet according to some other variables, like so:

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~clarity, scales = "free")

You will notice that I also supplied the scales argument within the facet_wrap() function which allows each faceted plot to have different x and y scales that fit accordingly. Compare the x and y axes of the ‘VS1’ plot with those of the ‘VVS2’. They have different scales.

Adding Labels

Let’s add labels to our plot…

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~cut) +
    labs(
        title = "Price vs. Carat",
        subtitle = "ggplot makes plotting so easy...",
        y = "Price (in $)",
        x = "Carat",
        caption = "This is a great-looking plot"
    )

Changing Theme

R has some preset plotting themes…

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~cut) +
    labs(
        title = "Price vs. Carat",
        subtitle = "ggplot makes plotting so easy...",
        y = "Price (in $)",
        x = "Carat",
        caption = "This is a great-looking plot"
    ) +
    theme_bw()

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~cut) +
    labs(
        title = "Price vs. Carat",
        subtitle = "ggplot makes plotting so easy...",
        y = "Price (in $)",
        x = "Carat",
        caption = "This is a great-looking plot"
    ) +
    theme_linedraw()

…there are several others.

Modifying Scales

Code
diamonds %>% 
    ggplot(aes(x = carat, y = price, color = cut)) +
    geom_point(alpha = .15) +
    geom_smooth() +
    facet_wrap(~cut) +
    labs(
        title = "Price vs. Carat",
        subtitle = "ggplot makes plotting so easy...",
        y = "Price (in $)",
        x = "Carat",
        caption = "This is a great-looking plot"
    ) +
    theme_bw() +
    scale_y_continuous(labels = scales::dollar_format())

Notice we converted the axis/scale on the plot to a dollar format…

Example of More Plots

With these basic tools, you now have the ability to create so many different types of plots to gain insights from your data.

Here are a few more plots with code to give you a flavor…

Code
diamonds %>% 
    ggplot(aes(price, fill = cut)) +
    geom_histogram() +
    theme_bw()

Code
diamonds %>% 
    ggplot(aes(price, fill = cut)) +
    geom_histogram(position = "dodge") +
    theme_bw() +
    scale_fill_brewer()

Code
diamonds %>% 
    ggplot(aes(price, fill = cut)) +
    geom_density() +
    theme_bw() +
    scale_fill_brewer() +
    facet_wrap(~cut)

There are other packages that help with creating nice plots… install and load ggridges.

Code
library(ggridges)
diamonds %>% 
    ggplot(aes(x = price, y = cut, fill = stat(x))) +
    geom_density_ridges_gradient(scale = 2) +
    scale_fill_viridis_c(name = "Price (in $)", option = "C") +
    theme_minimal() +
    scale_x_continuous(labels = scales::dollar_format())

Code
diamonds %>% 
    ggplot(aes(x = price, y = cut, fill = factor(stat(quantile)))) +
    stat_density_ridges(
        geom = "density_ridges_gradient", calc_ecdf = TRUE,
        quantiles = 4, quantile_lines = TRUE
    ) +
    scale_fill_brewer() +
    theme_linedraw() +
    scale_x_continuous(labels = scales::dollar_format())

Creating Interactive Plots

We also have the ability to create interactive plots with the help of a package called plotly. This is another example of the power of open-source coding, which gives us the ability to leverage code that others have built (that we may not have the expertise to create ourselves…). Like we did with the tidyverse, run install.packages("plotly") and then load it into your environment with library(plotly). All we have to do to make a plot interactive, is to save it into our environment using the assignment operator - <-. I am going to save my plot as g and then we have to run ggplotly(g).

Look at the code below:

Code
library(plotly)
g <- diamonds %>% 
    ggplot(aes(price, fill = cut)) +
    geom_histogram() +
    theme_bw()

ggplotly(g)

This is just a taste of the plots that can be generated…

Closing Remarks

The above is by no means a comprehensive introduction to R, but it does cover the basics and will allow you to get started on your own projects.

Cheers.