Spark UI
Apache Spark can be setup in multiple ways in cluster mode like yarn, mesos and also in standalone mode.
Apache Spark provides a suite of web user interfaces (UIs) that the user can use to monitor the status and resource
consumption of the Spark cluster.
Corridor uses Spark for running jobs through the platform and through integrated notebooks. Corridor provides a direct link from Corridor job in the platform to the underlying Spark UI.
The user has the ability to see Spark execution related information for currently running jobs and for previously run jobs, using Spark Web UI and Spark History Server UI, by configuring the options outlined below.
Note
Spark History Server has to be setup as a prerequisite to see Spark execution of previously run jobs. (Refer: Setting up Spark History Server)
Configurations
The user would need to add the appropriate configurations in api_config.py file where the API process is running.
SPARK_RM_UI_ENABLED
This setting controls whether to show SparkRM-UI link for the job details, with two possible values:
- True: This will take the values from the settings
SPARK_UI_URLandSPARK_HISTORY_SERVER_URLand show the RM UI link in the Job Details - False: This will ignore the values of
SPARK_UI_URLandSPARK_HISTORY_SERVER_URLin the configs and not show the RM UI link in the Job Details
Example:
SPARK_RM_UI_ENABLED = True
SPARK_UI_URL
This setting points to the URL of the Spark UI for currently running Spark jobs
Requires: SPARK_RM_UI_ENABLED = True
Possible values:
SPARK_UI_URL = <web_url>(<web_url>should be of the typehttp(s)://<host>:<port>/../jobs/)SPARK_UI_URL = 'from-spark'(supported only for standalone Spark installations )
Example:
SPARK_UI_URL = 'http(s)://<cluster_url>:<port>/cluster/app/'
# OR
SPARK_UI_URL = 'from-spark'
Note
'from-spark' should only be set for standalone Spark installations, correct behavior is not guaranteed for Spark setup in cluster mode (yarn, mesos, etc.)
SPARK_HISTORY_SERVER_URL
This setting points to the URL of the Spark UI for previously run Spark jobs
Requires: SPARK_RM_UI_ENABLED = True and SPARK_UI_URL = <web_url>
Possible values:
SPARK_HISTORY_SERVER_URL = <web_url> (<web_url> should be of the type http(s)://<host>:<port>/../history/)
Note
This setting can only be configured if Spark is run in cluster mode (yarn, mesos, etc.)
Example:
SPARK_HISTORY_SERVER_URL = 'http(s)://<history_server_url>:<history_server_port>/history/'