Skip to content

System: Minimum Requirements

This section describes the minimum requirements that are needed for a Corridor Installation.

Broadly, the components involved are:

  • Web Application Server
  • API Server
  • API - Celery worker
  • Spark - Celery worker
  • Jupyter Notebook
  • File Management
  • Metadata Database (SQL RDBMS)
  • Messaging Queue (Redis)

For very simple installations, all of these could be installed on the same machine, we recommend keeping them separate to simplify scalability needs.

Web Application Server

This serves the Web User Interface and Web APIs for the consumption of the User Interface. It is a flask application that serves all resources required for the functioning of the Corridor Platform - Web Interface.

Requirements

  • RAM: 2 GB
  • Processor: 2 CPU
  • Installation storage space: 10 GB
  • Python 3.9+ / pip 10.x+
  • WSGI HTTP Server (Example: Gunicorn)
  • Web Server (Example: Nginx)
  • Process Management (Example: Supervisor)

Warning

Prerequisite Python Version: 3.9+

API Server

This serves as an internal API service layer for orchestrating various business logic.

Warning

This needs to be able to write to the same file-management

Requirements

  • RAM: 2 GB
  • Processor: 2 CPU
  • Installation storage space: 10 GB
  • Python 3.9+ / pip 10.x+
  • Java 8
  • WSGI HTTP Server (Example: Gunicorn)
  • Process Management (Example: Supervisor)

Warning

Supported Python Version - Python 3.9+

API - Celery worker

A worker to handle any tasks for the API machine that take time and run asynchronously. It is recommended to have at least 2 workers.

Warning

This needs to be able to write to the same file-management

Requirements

  • RAM: 4 GB
  • Processor: 2 CPU
  • Installation storage space: 10 GB
  • Python 3.9+ / pip 10.x+
  • Java 8
  • Process Management (Example: Supervisor)

Warning

Supported Python Version - Python 3.9+

Spark - Celery worker

Worker to handle any spark jobs triggered by the API to run asynchronously. It is recommended to have at least 2 workers.

Note

This needs to be installed on a machine that is configured as a Spark Gateway (i.e. A master node or an edge node of the cluster). This is not the spark-workers of the cluster itself. The celery-worker process should be able to import the pyspark module.

Requirements

  • RAM: 16 GB
  • Processor: 8 CPU
  • HDFS storage space: 500 GB (depends on the data being processed, HDFS space to handle shuffles needs to be considered too)
  • Spark Workers 4+ (and more as per usage by users)

    • RAM: 32 GB
    • Processor: 16 CPU
  • Python 3.9+ / pip 10.x+

  • Java 8
  • Spark 3.4+
  • Process Management (Example: Supervisor)

Jupyter Notebook

A notebook for free-form analytical usage. We provide Jupyter Notebooks out-of-the-box but can integrate with any other notebooks too.

Note

This needs to be installed on a machine that is configured as a Spark Gateway (i.e. A master node or an edge node of the cluster). This is not the spark-workers of the cluster itself.

Requirements

  • RAM: 4 GB for base services and more as per usage by users
  • Processor: 4 CPU and more as per usage by users
  • Installation storage space: 10 GB
  • Python 3.9+
  • Spark 3.4+
  • Process Management (Example: Supervisor)

Warning

Supported Python Version - Python 3.9+

File Management

A file system management to store and retrieve files.

Requirements

  • File storage space: 10 GB
  • File System

    • Local File System
    • SSH-based File System

Metadata Database

This serves as an internal API service layer for orchestrating various business logic.

Requirements

  • RAM: 2 GB
  • Processor: 2 CPU
  • Database storage space: 1 GB
  • SQL Database

    • Oracle 19+
    • Postgres 11.7+

Messaging Queue

A low-latency task queue to send and receive information about the asynchronous tasks.

Important

Only standalone and sentinel Redis solutions supported.

Requirements

  • RAM: 4 GB
  • Processor: 2 CPU
  • DB Snapshots storage space: 10 GB
  • Redis 4+