Distributed computing with Hadoop and Spark | Principles of Data Science Class Notes