Org Resource Manager

Introduction

The Metrics Report System provides a comprehensive view of compute resource usage across clusters and subjects. It fetches real-time data from:

OrgResourceQuota Database – Subject allocations
Global Cluster Metrics DB – Aggregated cluster metrics
Subject Metrics DB – Subject-level resource usage
Subjects DB – Subject metadata

Metrics are accessible through both a REST API and a WebSocket interface, with internal coordination between multiple microservices and databases.

Architecture

The Org Resource Manager and Autoscaler architecture is organized into three major subsystems, each responsible for a distinct aspect of compute resource governance: quota lifecycle management, real-time metric aggregation, and scaling decision execution. These components operate in a loosely coupled, event-driven architecture using shared databases, streaming interfaces, and message queues for coordination.

The architecture supports both push-based real-time updates and pull-based configuration workflows, enabling it to scale effectively across diverse organizational deployments.

org-allocator

Download Image

1. Resource Management API Layer

This layer provides the foundational API interface for managing subject-level resource quotas and exposing resource metadata.

Components

Subject Quota DB A persistent data store that tracks resource allocation per subject. Each record is mapped using a unique quota_id, and includes metadata such as replica_count and allocation policies.
CRUD Controller An internal controller module that processes create, update, and delete operations and interfaces with the DB query system. It is accessed via RESTful endpoints exposed to external clients.
DB Query System Provides indexed access to stored quota data for read-only operations or quota report generation.
Live WebSocket Publisher Pushes real-time snapshots of current quota states to subscribers, ensuring clients receive up-to-date allocation metrics without polling.

Interaction

Clients interact with this layer primarily via the /quota REST endpoints for static quota management and via the WebSocket feed for continuous updates.

2. Metrics Aggregation and Reporting Layer

This layer is responsible for collecting, normalizing, and reporting real-time usage data for both individual subjects and entire clusters.

Components

Hardware Metrics Webhook Receives usage reports from external hardware or monitoring agents. These include real-time CPU, memory, and GPU utilization across nodes or clusters.
Usage Stats Aggregator Central service that merges subject-level usage with cluster-wide metrics and periodically refreshes usage records in the Subject Quota DB if inconsistencies are detected.
Metrics WebSocket Client Subscribes to a backend metrics stream and forwards usage snapshots to the autoscaler and other internal modules.
Query REST API Layer Allows external clients to query current cluster and subject metrics. For example, /metrics/report returns a merged view of all relevant usage statistics.
DB Update Module Synchronizes any modified resource metrics or updated replica counts to persistent storage to maintain consistency between reported and allocated states.

Interaction

This layer maintains a continuous monitoring loop. Metrics are either pulled via webhooks or streamed over WebSocket, and the aggregator reconciles this data with DB records for visibility and actionability.

3. Autoscaling and Allocation Layer

The autoscaler layer evaluates system state using DSL-based workflows and dispatches resource allocation commands when scaling conditions are met.

Components

Timer Trigger Module Initiates periodic autoscaler evaluations by emitting events at a configured interval.
Subject Intervention System The core decision-making system, composed of:
Autoscaling Checker DSL Executor: Evaluates if current conditions meet autoscaling thresholds.
Autoscaling Decision Maker DSL Executor: Determines the appropriate replica count for each subject based on defined policies.
Resource Allocator API Client Issues formal allocation requests to the compute backend or resource manager.
Resource Allocation Response Listener Subscribes to a NATS message bus and listens for allocation confirmations. Each response is correlated with a pending allocation request.
Messaging Backbone Client Publishes allocation outcomes to the appropriate clients or downstream systems, completing the feedback loop.
Manual Resource Allocation Interface Allows an admin to manually trigger resource updates using a dedicated WebSocket interface. The system ensures the response is routed and delivered through the same communication channel.

Interaction

The autoscaler layer supports both automated periodic scaling and ad-hoc interventions. In both cases, usage metrics feed into a DSL-based evaluator, and any change in allocation is issued through the allocator API and confirmed via NATS.

Supporting Infrastructure

Init Module On system startup, this module loads configuration from Redis (as primary) and a local fallback cache. It also manages runtime settings used by the autoscaler.
Redis Cache Provides backup persistence for system configuration and transient state, such as active WebSocket sessions and unresolved allocation requests.
Configuration APIs A set of endpoints (/autoscale/config, /autoscale/interval) to manage scaling thresholds, DSL execution parameters, and scheduler behavior.

Communication Channels

Interface Type	Purpose
REST APIs	Expose static configuration and quota management interfaces
WebSocket	Stream live metrics and support interactive admin commands
Redis	Serve as a cache layer for config, backups, and connection state
NATS Messaging	Asynchronous pipeline for allocation request/response coordination

Execution Summary

Quotas are created or modified via the Resource Management API.
Metrics are streamed continuously through the Aggregation Layer.
Periodic events trigger autoscaling evaluation in the Decision Layer.
If required, allocations are dispatched, confirmed, and persisted.
Both clients and administrators are notified via WebSocket or messaging backbones.

This decoupled, modular design ensures the system remains resilient, scalable, and extensible, capable of supporting both automated workflows and human-in-the-loop operations.

Schema

OrgResourceQuota

from dataclasses import dataclass

@dataclass
class OrgResourceQuota:
    subject_id: str = ''
    quota_id: str = ''
    allocation_info: str = ''
    replica_count: str = ''

Field	Type	Description
`subject_id`	str	Unique identifier of the subject
`quota_id`	str	Unique identifier for the quota entry
`allocation_info`	str	Metadata or policy explaining the allocation
`replica_count`	str	Number of replicas assigned to this subject

REST API Documentation

GET `/metrics/report`

Description: Returns the full consolidated metrics report including cluster-wide metrics and all subject data and usage metrics.

Response Structure:

{
  "success": true,
  "data": {
    "cluster_metrics": { ... },
    "subjects": {
      "data": { "subject_id": { ... } },
      "metrics": { "subject_id": { ... } }
    }
  }
}

cURL Example:

curl -X GET http://localhost:8000/metrics/report

OrgResourceQuota APIs

POST `/quota`

Description: Creates a new quota entry.

Request Body:

{
  "subject_id": "subject-001",
  "quota_id": "quota-001",
  "allocation_info": "gpu-allocation",
  "replica_count": "3"
}

cURL Example:

curl -X POST http://localhost:8000/quota \
     -H "Content-Type: application/json" \
     -d @quota.json

GET `/quota/<quota_id>`

Description: Fetches a specific quota entry.

cURL Example:

curl http://localhost:8000/quota/quota-001

PUT `/quota/<quota_id>`

Description: Updates an existing quota.

Request Body:

{
  "replica_count": "5"
}

cURL Example:

curl -X PUT http://localhost:8000/quota/quota-001 \
     -H "Content-Type: application/json" \
     -d '{"replica_count": "5"}'

DELETE `/quota/<quota_id>`

Description: Deletes a specific quota entry.

cURL Example:

curl -X DELETE http://localhost:8000/quota/quota-001

POST `/quotas`

Description: Queries multiple quota entries.

Request Body:

{
  "subject_id": "subject-001"
}

cURL Example:

curl -X POST http://localhost:8000/quotas \
     -H "Content-Type: application/json" \
     -d '{"subject_id": "subject-001"}'

Cluster Metrics API

GET `/cluster/<cluster_id>`

Description: Returns cluster-wide metrics such as total CPU, memory, and GPU utilization.

cURL Example:

curl http://localhost:8888/cluster/cluster-001

Response:

{
  "success": true,
  "data": {
    "total_cpu": 128,
    "used_cpu": 67.5,
    "total_memory": 102400,
    "used_memory": 67800
  }
}

Subject Metrics API

GET `/subjects`

Description: Returns real-time usage metrics for all known subjects.

cURL Example:

curl http://localhost:8891/subjects

Response:

{
  "success": true,
  "data": [
    {
      "subject_id": "subject-001",
      "cpu_usage": 8.4,
      "memory_usage": 5120
    }
  ]
}

Subjects DB API

GET `/subjects?subject_id=<id1>&subject_id=<id2>`

Description: Returns metadata for one or more subject IDs.

cURL Example:

curl -G http://localhost:8892/subjects \
     --data-urlencode "subject_id=subject-001" \
     --data-urlencode "subject_id=subject-002"

Response:

{
  "success": true,
  "data": [
    {
      "subject_id": "subject-001",
      "name": "Trainer-A",
      "type": "gpu-node"
    }
  ]
}

WebSocket Subscription

Endpoint

ws://<host>:8765/?interval=<seconds>

Description

Connects to the metrics reporting service and pushes a complete report every interval seconds. The message payload is identical to the /metrics/report REST API.

Query Parameters

Name	Type	Default	Description
`interval`	int	30	Time in seconds between consecutive push updates.

Example Usage (with `websocat`)

websocat "ws://localhost:8765/?interval=60"

Example Usage (JavaScript)

const ws = new WebSocket("ws://localhost:8765/?interval=15");

ws.onmessage = function(event) {
    const report = JSON.parse(event.data);
    console.log("Received metrics report:", report);
};

OrgAutoscaler Documentation

Introduction

OrgAutoscaler is a dynamic autoscaling engine designed to:

Periodically execute a DSL-based policy workflow to evaluate scaling needs.
Trigger subject-level resource allocation using DSL workflow results.
Support ad-hoc scaling through a WebSocket interface.
Listen for allocation confirmation messages via NATS and route them back to requestors.

It integrates with:

Flask for configuration and control
WebSockets for live autoscaling requests
NATS for async response delivery
A central DSL executor for autoscaling logic

2. Autoscaler Flow Explanation

Periodic Execution Flow

A background Timer sends an "activate" message every N seconds.
A background listener thread picks up the event and:
Calls get_metrics_report() from QuotaClient
Feeds the metrics into the DSL executor
Expects a result in the form: { subject_id: replica_count }
Each result is sent to SubjectResourceAllocator, which pushes the allocation request.
Allocation responses are published to a NATS topic:

<ORG_ID>__alloc_response 5. The AllocationResponseListener listens on this topic and pushes responses into a shared response queue.

Ad-hoc Execution Flow (WebSocket)

A WebSocket client connects and sends:

json { "subject_id": "subj-123", "replica_count": 4 } 2. The autoscaler:

Attaches a unique UUID
Stores the WebSocket connection in a dictionary: connections[uuid]
Sends the allocation request
When a NATS response with the same UUID arrives:
The router thread finds the matching socket
Sends the response back
Closes the WebSocket connection

Autoscaler APIs

POST `/autoscale/config`

Set DSL execution parameters dynamically.

Request

{
  "incr": 2,
  "threshold": 0.7
}

Response

{
  "success": true,
  "message": "Configuration updated"
}

GET `/autoscale/config`

Fetch current DSL parameters.

Response

{
  "success": true,
  "data": {
    "incr": 2,
    "threshold": 0.7
  }
}

POST `/autoscale/interval`

Set the autoscaler execution interval (in seconds).

Request

{
  "interval": 60
}

Response

{
  "success": true,
  "message": "Interval updated"
}

GET `/autoscale/interval`

Get the current autoscaler interval.

Response

{
  "success": true,
  "data": {
    "interval": 60
  }
}

WebSocket API – Ad-hoc Autoscaling

Connect

ws://<host>:6789

Send

{
  "subject_id": "subject-xyz",
  "replica_count": 3
}

Receive (on success)

{
  "success": true,
  "response": {
    "uuid": "123e4567...",
    "response": {
      "status": "allocated",
      "replicas": 3
    }
  }
}

Receive (on timeout)

{
  "success": false,
  "error": "Timeout waiting for response"
}

Org Resource Manager

Introduction

Architecture

1. Resource Management API Layer

Components

Interaction

2. Metrics Aggregation and Reporting Layer

Components

Interaction

3. Autoscaling and Allocation Layer

Components

Interaction

Supporting Infrastructure

Communication Channels

Execution Summary

Schema

OrgResourceQuota

REST API Documentation

GET /metrics/report

OrgResourceQuota APIs

POST /quota

GET /quota/<quota_id>

PUT /quota/<quota_id>

DELETE /quota/<quota_id>

POST /quotas

Cluster Metrics API

GET /cluster/<cluster_id>

Subject Metrics API

GET /subjects

Subjects DB API

GET /subjects?subject_id=<id1>&subject_id=<id2>

WebSocket Subscription

Endpoint

Description

Query Parameters

Example Usage (with websocat)

Example Usage (JavaScript)

OrgAutoscaler Documentation

Introduction

2. Autoscaler Flow Explanation

Periodic Execution Flow

Ad-hoc Execution Flow (WebSocket)

Autoscaler APIs

POST /autoscale/config

GET /autoscale/config

POST /autoscale/interval

GET /autoscale/interval

WebSocket API – Ad-hoc Autoscaling

GET `/metrics/report`

POST `/quota`

GET `/quota/<quota_id>`

PUT `/quota/<quota_id>`

DELETE `/quota/<quota_id>`

POST `/quotas`

GET `/cluster/<cluster_id>`

GET `/subjects`

GET `/subjects?subject_id=<id1>&subject_id=<id2>`

Example Usage (with `websocat`)

POST `/autoscale/config`

GET `/autoscale/config`

POST `/autoscale/interval`

GET `/autoscale/interval`