By this blog we will share the titles for learning Apache Spark, Basics on Hadoop which is one of the big data tool, and motivations for Apache Spark which is not replacement of Apache Hadoop, but its friend of big data.
Blog 1 – Introduction to Big Data
Blog 2 – Hadoop, Spark’s Motivations
Blog 3 – Apache Spark’s History and Unified Platform for Big Data
Blog 4 – Apache Spark’s First Step – AWS, Apache Spark
Blog 5 – Apache Spark Languages with basic Hands-on
Blog 6 – The RDD, RDDs Input, Hands-on
Blog 7 – Transformation, map, mapPartitions
Blog 8 – RDD Combiner
Blog 9 – Actions, Persistence Actions, Hands-on
Blog 10 – Implicit Conversions, Hands-on
Blog 11 – Key Value Methods
Blog 12 – Caching Data, Hands-on
Blog 13 – Accumulator
Apache Hadoop is an open source technology which is the big data management platform and most associated with big data analytics applications. The distributed processing framework was created in 2006, primarily at Yahoo and based partly on ideas outlined by Google in a pair of technical papers; soon, other Internet companies such as Facebook, LinkedIn and Twitter adopted the technology and began contributing to its development. In the past few years, Hadoop had evoled into a complex ecosystem of infrastructure components and related tools, which are packaged together by various vendors in commercial Hadoop distributions.
One of the best tutorials on Hadoop thanks to Yahoo team.
Below are the pointers on why Apache Spark and Motivations for Apache Spark…
- Readability
- Expressiveness
- Fast
- Testability
- Interactive
- Fault Tolerant
- Unify Big Data Platform
In Blog 3 – We will share the detaled study on Apache Spark’s History and Unified Platform for Big Data.
Originally posted here.