Spark UI

Apache Spark can be setup in multiple ways in cluster mode like yarn, mesos and also in standalone mode. Apache Spark provides a suite of web user interfaces (UIs) that the user can use to monitor the status and resource consumption of the Spark cluster.

Corridor uses Spark for running jobs through the platform and through integrated notebooks. Corridor provides a direct link from Corridor job in the platform to the underlying Spark UI.

The user has the ability to see Spark execution related information for currently running jobs and for previously run jobs, using Spark Web UI and Spark History Server UI, by configuring the options outlined below.

Note

Spark History Server has to be setup as a prerequisite to see Spark execution of previously run jobs. (Refer: Setting up Spark History Server)

Configurations

The user would need to add the appropriate configurations in api_config.py file where the API process is running.

SPARK_RM_UI_ENABLED

This setting controls whether to show SparkRM-UI link for the job details, with two possible values:

True: This will take the values from the settings SPARK_UI_URL and SPARK_HISTORY_SERVER_URL and show the RM UI link in the Job Details
False: This will ignore the values of SPARK_UI_URL and SPARK_HISTORY_SERVER_URL in the configs and not show the RM UI link in the Job Details

Example:

SPARK_RM_UI_ENABLED = True

SPARK_UI_URL

This setting points to the URL of the Spark UI for currently running Spark jobs

Requires: SPARK_RM_UI_ENABLED = True

Possible values:

SPARK_UI_URL = <web_url> (<web_url> should be of the type http(s)://<host>:<port>/../jobs/)
SPARK_UI_URL = 'from-spark' (supported only for standalone Spark installations )

Example:

SPARK_UI_URL = 'http(s)://<cluster_url>:<port>/cluster/app/'
# OR
SPARK_UI_URL = 'from-spark'

Note

'from-spark' should only be set for standalone Spark installations, correct behavior is not guaranteed for Spark setup in cluster mode (yarn, mesos, etc.)

SPARK_HISTORY_SERVER_URL

This setting points to the URL of the Spark UI for previously run Spark jobs

Requires: SPARK_RM_UI_ENABLED = True and SPARK_UI_URL = <web_url>

Possible values: SPARK_HISTORY_SERVER_URL = <web_url> (<web_url> should be of the type http(s)://<host>:<port>/../history/)

Note

This setting can only be configured if Spark is run in cluster mode (yarn, mesos, etc.)

Example:

SPARK_HISTORY_SERVER_URL = 'http(s)://<history_server_url>:<history_server_port>/history/'