Data manipulation is a crucial skill in R programming, enabling you to transform raw data into meaningful insights. This unit covers essential techniques, from basic operations on vectors and data frames to advanced methods using the dplyr package. You'll learn how to subset, filter, and merge data, work with dates and times, and handle missing values. The unit also explores practical applications and common pitfalls, equipping you with the tools to efficiently wrangle data in real-world scenarios.
select(), filter(), mutate(), group_by(), and summarize()%>%): a tool in dplyr that allows you to chain multiple operations together in a readable and efficient mannerc() function (concatenate)matrix() functiondata.frame() functionlist() functionfactor() function[] for vectors, matrices, and data frames[[]] or $ for lists>, <, ==, !=, &, |) to create conditionsorder() function to generate a sorting indexmerge() function to perform inner, left, right, or full joinsreshape2 package functions melt() and dcast() for reshaping dataselect(): choose columns from a data frame by name or positionfilter(): subset rows based on a logical conditionmutate(): create new columns or modify existing ones using expressionsgroup_by(): split a data frame into groups based on one or more variablessummarize(): compute summary statistics for each group
group_by() to aggregate dataarrange(): sort a data frame by one or more columnsjoin() functions: combine data frames based on a common variable
inner_join(), left_join(), right_join(), full_join(), semi_join(), anti_join()%>%)
Date, POSIXct, POSIXltas.Date(), as.POSIXct(), and strptime()format() functiondifftime()NAis.na()na.omit() or complete.cases()ifelse() or replace() to conditionally replace valuesna.rm argument in functions like mean(), sum(), and max() to exclude missing values from calculationsas.numeric(), as.character(), or as.Date()<- instead of = for assignmentarrange() or group_by() before calculations