- Client mode
- Cluster mode.
- Local Mode
![](https://i1.wp.com/blog.knoldus.com/wp-content/uploads/2019/12/Image-5.png?resize=476%2C273&ssl=1)
![](https://i0.wp.com/blog.knoldus.com/wp-content/uploads/2019/12/Image-6.png?resize=488%2C284&ssl=1)
The drawback of Spark Client mode w.r.t YARN is that: The client machine needs to be available at all times whenever any job is running. You cannot submit your job and then turn off your laptop and leave from office until your job is finished. While we work with this spark mode, the chance of network disconnection between “driver” and “spark infrastructure” reduces. Since they reside in the same infrastructure. Also, reduces the chance of job failure.In this case, it won’t be able to give the output as the connection between Driver and Executors will be broken.
Cluster Mode: The only difference in this mode is that Spark is installed in the cluster, not in the local machine. This is the most common, the user sends a JAR file or a script to the Cluster Manager. The latter will instantiate a Driver and Executors on the different nodes of the cluster.
- When working in Cluster mode, all JARs related to the execution of your application need to be publicly available to all the workers. This means you can either manually place them in a shared place or in a folder for each of the workers.
- The CM is responsible for all processes related to the Spark application. It facilitates the allocation of resources and releases them as soon as the application is finished.
- Driver runs on one of the cluster's Worker nodes.It runs as a dedicated, standalone process inside the Worker.
- Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads. This has the benefit of isolating applications from each other, on both the scheduling side (each driver schedules its own tasks) and executor side (tasks from different applications run in different JVMs.
- While we work with this spark mode, the chance of network disconnection
between “driver” and “spark infrastructure” reduces. Since they reside
in the same infrastructure. Also, reduces the chance of job failure.
Local Mode: The Driver and Executors run on the machine on which the user is logged in. It is only recommended for the purpose of testing an application in a local environment or for executing unit tests.
Configurations to run a Spark Job on a YARN cluster
- master – Determines how to run the job.
- deploy-mode – We selected ‘cluster’ to run the above SparkPi example within the cluster. To run the problem outside of the cluster, then select the ‘client’ option.
- driver-memory – The amount memory available for the driver process. In a YARN cluster Spark configuration the Application Master runs the driver.
- executor-memory – The amount of memory allocated to the executor process
- executor-cores – the total number of cores allocated to the executor process
- queue – The YARN queue name on which this job will run. If you have not already defined queues to your cluster, it is best to utilize the ‘default’ queue.
No comments:
Post a Comment