Apache
Spark is an open-source cluster computing framework originally developed
in the AMP Lab at UC Berkeley. In contrast to Hadoop's two-stage
disk-based MapReduce paradigm, Spark's in-memory primitives provide
performance up to 100 times faster for certain applications. By allowing
user programs to load data into a cluster's memory and query it repeatedly,
Spark is well suited to machine learning algorithms. Spark can interface
with a wide variety of file or storage systems, including Hadoop
Distributed File System (HDFS), Cassandra, OpenStack Swift, or Amazon
S3.
Spark
is one of the most actively developed open source projects. It has over 465
contributors in 2014, making it the most active project in the Apache Software
Foundation and among Big Data open source projects.
Project
Components
The
Spark project consists of multiple components.
Spark
Core is the foundation of the overall project. It provides distributed task
dispatching, scheduling, and basic I/O functionalities. The fundamental
programming abstraction is called Resilient Distributed Datasets, a logical
collection of data partitioned across machines. RDDs can be created by
referencing datasets in external storage systems, or by applying coarse-grained
transformations (e.g. map, filter, reduce, join) on existing RDDs.
The
RDD abstraction is exposed through a language-integrated API
in Java, Python, Scala similar to local, in-process
collections. This simplifies programming complexity because the way
applications manipulate RDDs is similar to manipulating local collections of
data.
Spark
SQL
Spark
SQL is a component on top of Spark Core that introduces a new data abstraction
called SchemaRDD, which provides support for structured and semi-structured
data. Spark SQL provides a domain-specific language to manipulate SchemaRDDs in
Scala, Java, or Python. It also provides SQL language support, with
command-line interfaces and ODBC/JDBC server.
Spark
Streaming
Spark
Streaming leverages Spark Core's fast scheduling capability to
perform streaming analytics. It ingests data in mini-batches and performs
RDD transformations on those mini-batches of data. This design enables the same
set of application code written for batch analytics to be used in streaming
analytics, on a single engine.
Scala is
an object-functional programming language for
general software applications. Scala has full support for functional
programming and a very strong static type system. This allows
programs written in Scala to be very concise and thus smaller in size than
other general-purpose programming languages. Many of Scala's design
decisions were inspired by criticism over the shortcomings of Java.
Scala
source code is intended to be compiled to Java bytecode, so that the
resulting executable code runs on a Java virtual machine. Java libraries
may be used directly in Scala code, and vice versa. Like Java, Scala
is object-oriented, and uses a curly-brace syntax reminiscent of
the C programming language. Unlike Java, Scala has many features
of functional programming languages like Scheme, Standard
ML and Haskell, including currying, type
inference, immutability, lazy evaluation, and pattern matching.
It also has an advanced type system supporting algebraic data
types, covariance and contra variance, higher-order types,
and anonymous types. Other features of Scala not present in Java
include operator overloading, optional parameters, named
parameters, raw strings, and no checked exceptions.
The
name Scala is a portmanteau of "scalable" and
"language", signifying that it is designed to grow with the demands
of its users.
To
Learn More Follow Below Link:
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in APACHE SPARK, kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on APACHE SPARK. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Saurabh Srivastava
MaxMunus
E-mail: saurabh@maxmunus.com
Skype id: saurabhmaxmunus
Ph:+91 8553576305 / 080 - 41103383
http://www.maxmunus.com/