In Apache Spark Worker nodes and Executors are same or different?

You are currently viewing In Apache Spark Worker nodes and Executors are same or different?

In the context of Apache Spark, worker nodes and executors are not the same; they are different components of a Spark cluster.

  1. Worker Nodes:
  • Worker nodes are also known as slave nodes in a Spark cluster.
  • These nodes are responsible for running the tasks assigned by the Spark driver program.
  • Worker nodes manage the resources (CPU, memory, etc.) and execute the tasks on behalf of the driver.
  • They can be part of a cluster managed by a cluster manager such as Apache Mesos, Hadoop YARN, or Spark’s built-in standalone cluster manager.
  1. Executors:
  • Executors are a specific component of a worker node. Each worker node can have multiple executors.
  • Executors are responsible for executing the tasks within a Spark application. They run in separate JVMs.
  • Executors are created and managed by the Spark cluster manager (e.g., YARN, Mesos, or Spark’s standalone cluster manager) and are allocated resources from the worker nodes.
  • Multiple executors can run on a single worker node, and they are used to parallelize the processing of tasks within a Spark application.

Working Process :

  1. Let’s say a user submits a job using “spark-submit”.
  2. “spark-submit” will in-turn launch the Driver which will execute the main() method of our code.
  3. Driver contacts the cluster manager and requests for resources to launch the Executors.
  4. The cluster manager launches the Executors on behalf of the Driver.
  5. Once the Executors are launched, they establish a direct connection with the Driver.
  6. The driver determines the total number of Tasks by checking the Lineage.
  7. The driver creates the Logical and Physical Plan.
  8. Once the Physical Plan is generated, Spark allocates the Tasks to the Executors.
  9. Task runs on Executor and each Task upon completion returns the result to the Driver.
  10. Finally, when all Task is completed, the main() method running in the Driver exits, i.e. main() method invokes sparkContext.stop().
  11. Finally, Spark releases all the resources from the Cluster Manager.

Leave a Reply