Spark: Cluster Manager

Apache Spark is an engine for Big Data processing.Cluster manager is an external service responsible for acquiring resources on the spark cluster. In the cluster, there is a master and N number of workers. Cluster Manager keeps track of the available resources (nodes) available in the cluster. It schedules and divides resource in the host machine which forms the cluster. The prime work of the cluster manager is to divide resources across applications. It works as an external service for acquiring resources on the cluster.The cluster manager dispatches work for the cluster. Spark supports pluggable cluster management. SparkContext can connect to several types clusters of Once the connection is established, Spark acquires executors on the nodes in the cluster to run its processes, does some computations, and stores data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

Apache Spark Supports three Types of cluster manager

Standalone: A basic manager to set up a cluster.
Apache Mesos: Generalized/commonly-used cluster manager, also runs Hadoop MapReduce and other applications.
YARN: Responsible for resource management in Hadoop.

Standalone: Standalone mode is a simple cluster manager incorporated with Spark. It makes it easy to setup a cluster that Spark itself manages and can run on Linux,Window or Mac OSX.

How does Standalone Cluster Work?

It has masters and number of workers with configured amount of memory and CPU cores. In Spark standalone cluster mode, Spark allocates resources based on the core. By default, an application will grab all the cores in the cluster.

Apache Mesos

Mesos handles the workload in distributed environment by dynamic resource sharing and isolation. It is healthful for deployment and management of applications in large-scale cluster environments. Apache Mesos clubs together the existing resource of the machines/nodes in a cluster. From this, a variety of workloads may use. This is node abstraction, thus it decreases an overhead of allocating a specific machine for different workloads. It is resource management platform for Hadoop and Big Data cluster.

Companies such as Twitter, Xogito, and Airbnb use Apache Mesos as it can run on Linux or Mac OSX.

Spark

Cluster Manager

No comments:

Post a Comment