Apache Spark is an engine for Big Data processing.Cluster manager is an external service responsible for acquiring resources on the spark cluster. In the cluster, there is a master and N number of workers. Cluster Manager keeps track of the available resources (nodes) available in the cluster. It schedules and divides resource in the host machine which forms the cluster. The prime work of the cluster manager is to divide resources across applications. It works as an external service for acquiring resources on the cluster.The cluster manager dispatches work for the cluster. Spark supports pluggable cluster management. SparkContext can connect to several types clusters of Once the connection is established, Spark acquires executors on the nodes in the cluster to run its processes, does some computations, and stores data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.
Apache Spark Supports three Types of cluster manager
- Standalone: A basic manager to set up a cluster.
- Apache Mesos: Generalized/commonly-used cluster manager, also runs Hadoop MapReduce and other applications.
- YARN: Responsible for resource management in Hadoop.
Companies such as Twitter, Xogito, and Airbnb use Apache Mesos as it can run on Linux or Mac OSX.
No comments:
Post a Comment