Intro to Programming in R

💻Intro to Programming in R Unit 1 – Intro to R and RStudio

R and RStudio are essential tools for data analysis and statistical computing. R offers a wide range of functions for data manipulation, visualization, and modeling, while RStudio provides a user-friendly interface for writing and executing R code. This introduction covers the basics of R and RStudio, including installation, syntax, data types, and common data structures. It also explores data import, manipulation, and visualization techniques, setting the foundation for more advanced statistical analysis and programming in R.

What's R and Why Use It?

  • R is a programming language and environment for statistical computing and graphics
  • Provides a wide variety of statistical and graphical techniques (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering)
  • Highly extensible through functions and packages which extend its capabilities
  • R is an interpreted language, meaning that code can be written and executed without the need for a compiler
  • R is open-source and freely available, making it accessible to a wide range of users
  • Widely used in academia and industry for data analysis, statistical modeling, and data visualization
  • Offers powerful tools for data manipulation, making it easy to clean, transform, and reshape data
  • Supports reproducible research through tools like R Markdown and Jupyter Notebooks

Getting Started with R and RStudio

  • RStudio is an integrated development environment (IDE) for R that provides a user-friendly interface
  • To start using R, first download and install R from the official CRAN (Comprehensive R Archive Network) website
  • Next, download and install RStudio from the official RStudio website
  • Launch RStudio and familiarize yourself with the interface, which includes:
    • Console: where you enter commands and see output
    • Script editor: where you write and save R code
    • Environment: shows objects currently in memory
    • Plots, Packages, Help, and Viewer panes
  • Set your working directory using
    setwd()
    to specify where R will look for files and save output
  • Install packages using
    install.packages()
    to extend R's functionality
  • Load packages using
    library()
    to make their functions available for use in your current session

R Basics: Syntax and Data Types

  • R is case-sensitive, so
    myVariable
    and
    myvariable
    are treated as different objects
  • Comments start with
    #
    and are used to explain code or disable lines of code
  • R has several basic data types, including:
    • Numeric: real numbers (e.g.,
      3.14
      )
    • Integer: whole numbers (e.g.,
      42L
      )
    • Character: text strings (e.g.,
      "hello"
      )
    • Logical: boolean values (
      TRUE
      or
      FALSE
      )
  • R uses the
    <-
    operator for assignment (e.g.,
    x <- 42
    ), although
    =
    can also be used
  • Mathematical operations follow the usual order of precedence (PEMDAS)
  • Comparison operators (
    <
    ,
    >
    ,
    <=
    ,
    >=
    ,
    ==
    ,
    !=
    ) are used to compare values and return logical values
  • Logical operators (
    &
    ,
    |
    ,
    !
    ) are used to combine or negate logical values

Working with Variables and Functions

  • Variables are used to store values and are created using the assignment operator (
    <-
    or
    =
    )
  • Variable names should be descriptive and follow a consistent naming convention (e.g.,
    snake_case
    or
    camelCase
    )
  • Functions are reusable pieces of code that perform a specific task
  • R has many built-in functions (e.g.,
    mean()
    ,
    sum()
    ,
    plot()
    ) and users can also define their own functions
  • Functions are called using the syntax
    function_name(argument1, argument2, ...)
  • Arguments are values passed to a function, which can be mandatory or optional
  • Functions can return a value using the
    return()
    statement, or the last expression evaluated will be returned automatically
  • R uses lexical scoping, meaning that functions have access to variables defined in their enclosing environment

Data Structures in R

  • R has several built-in data structures for storing collections of values:
    • Vectors: one-dimensional arrays that hold elements of the same data type
    • Lists: one-dimensional arrays that can hold elements of different data types
    • Matrices: two-dimensional arrays that hold elements of the same data type
    • Data frames: two-dimensional structures that can hold elements of different data types (like a table)
  • Vectors are created using the
    c()
    function (e.g.,
    my_vector <- c(1, 2, 3)
    )
    • Elements in a vector are accessed using square brackets and an index (e.g.,
      my_vector[1]
      )
    • Vectors can be used in arithmetic operations, which are applied element-wise
  • Lists are created using the
    list()
    function (e.g.,
    my_list <- list(1, "a", TRUE)
    )
    • Elements in a list are accessed using double square brackets or
      $
      (e.g.,
      my_list[[1]]
      or
      my_list$element_name
      )
  • Matrices are created using the
    matrix()
    function (e.g.,
    my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
    )
    • Elements in a matrix are accessed using square brackets and row/column indices (e.g.,
      my_matrix[1, 2]
      )
  • Data frames are created using the
    data.frame()
    function (e.g.,
    my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
    )
    • Elements in a data frame are accessed using
      $
      or square brackets (e.g.,
      my_df$x
      or
      my_df[, "x"]
      )

Importing and Manipulating Data

  • R can import data from various file formats, including CSV, Excel, and SQL databases
  • The
    read.csv()
    function is used to read CSV files (e.g.,
    my_data <- read.csv("data.csv")
    )
  • The
    readxl
    package provides functions for reading Excel files (e.g.,
    read_excel()
    )
  • The
    DBI
    and
    RMySQL
    /
    RPostgreSQL
    packages allow for connecting to and querying SQL databases
  • The
    dplyr
    package provides a set of functions for data manipulation, including:
    • filter()
      : subset rows based on conditions
    • select()
      : subset columns by name
    • mutate()
      : create new columns or modify existing ones
    • group_by()
      and
      summarize()
      : aggregate data by groups and calculate summary statistics
  • The
    tidyr
    package provides functions for reshaping data, such as
    pivot_longer()
    and
    pivot_wider()
    for converting between long and wide formats

Visualizing Data with R

  • R provides powerful tools for creating a wide range of visualizations, from simple scatter plots to complex interactive dashboards
  • The base R plotting system includes functions like
    plot()
    ,
    hist()
    , and
    boxplot()
    for creating basic graphs
  • The
    ggplot2
    package provides a flexible and expressive framework for creating more advanced visualizations
    • Graphs are built up in layers, starting with the
      ggplot()
      function and adding components like geometric objects (
      geom_point()
      ,
      geom_line()
      , etc.), scales, and facets
    • Aesthetics (e.g., color, size, shape) are used to map variables to visual properties of the graph
  • Other packages for specific types of visualizations include:
    • plotly
      for interactive web-based graphs
    • leaflet
      for interactive maps
    • networkD3
      for network graphs
  • R Markdown and Shiny are tools for creating reproducible reports and interactive web applications that incorporate visualizations

Helpful Resources and Next Steps

  • The official R documentation and help files provide detailed information on functions and packages
  • Online resources like Stack Overflow, R-bloggers, and the RStudio Community are great places to find answers to questions and learn from other users
  • Books like "R for Data Science" by Hadley Wickham and Garrett Grolemund and "Advanced R" by Hadley Wickham provide in-depth coverage of R programming and best practices
  • Online courses on platforms like Coursera, DataCamp, and edX offer structured learning paths for R and data science
  • Participating in local R user groups or attending conferences like useR! and RStudio Conference is a great way to network and learn from the R community
  • As you continue learning R, focus on developing your skills in:
    • Data wrangling and manipulation with
      dplyr
      and
      tidyr
    • Data visualization with
      ggplot2
      and other packages
    • Statistical modeling and machine learning with packages like
      lm()
      ,
      glm()
      , and
      caret
    • Creating reproducible reports and applications with R Markdown and Shiny
  • Consider working on personal projects or contributing to open-source packages to apply your skills and build your portfolio


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.