This article was posted on Data Flair. Below is a quick overview of the original article.
1.Objective
This tutorial provides introduction to Apache Spark, what are its ecosystem components, Spark abstraction – RDD, transformation and action. The objective of this introductory guide is to provide detailed overview of spark, its history, architecture, deployment mode and RDD.
2.History:
Apache Spark was introduced in 2009 in the UC Berkeley R&D Lab, later it become AMPLab. It was open sourced in 2010 under BSD license. In 2013 spark was donated to Apache Software Foundation where it became top-level project in 2014. Apache Spark became most popular project at apache in 2015.
3.Introduction
Apache Spark is a general-purpose & lightning fast cluster computing system. It provides high-level API like Java, Scala, Python and R. Apache Spark is a tool for Running Spark Applications. Spark is 100 times faster than Hadoop and 10 times faster than accessing data from disk. Spark is written in Scala but provides rich APIs in Scala, Java, Python and R. I can be integrated with Hadoop and can process existing HDFS data.
4.Need for Spark
5.Components:
- Spark core
- Spark SQL
- Spark streaming
- MLlib
- GraphX
6.Resilient Distributed Dataset – RDD
7.Spark Shell:
To read the full article or get Spark training, click here.
Top DSC Resources
- Article: What is Data Science? 24 Fundamental Articles Answering This Question
- Article: Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Tutorial: Data Science Cheat Sheet
- Tutorial: How to Become a Data Scientist – On Your Own
- Categories: Data Science – Machine Learning – AI – IoT – Deep Learning
- Tools: Hadoop – DataViZ – Python – R – SQL – Excel
- Techniques: Clustering – Regression – SVM – Neural Nets – Ensembles – Decision Trees
- Links: Cheat Sheets – Books – Events – Webinars – Tutorials – Training – News – Jobs
- Links: Announcements – Salary Surveys – Data Sets – Certification – RSS Feeds – About Us
- Newsletter: Sign-up – Past Editions – Members-Only Section – Content Search – For Bloggers
- DSC on: Ning – Twitter – LinkedIn – Facebook – GooglePlus
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge