Skip to content

AI Ratelimit Policies Overview

AIRateLimitPolicy Custom Resource (CR) defines the specific rate-limiting rules for AI requests, including limits based on token usage and request count. Below is a breakdown of the key fields:

organization

Defines the organization or tenant for which the rate limit applies.

tokenCount

Specifies limits based on token usage within a defined time unit.

  • unit: Time unit for the rate limit.
  • requestTokenCount: Limit on the number of tokens for processing requests within the time unit.
  • responseTokenCount: Limit on the number of tokens for responses within the time unit.
  • totalTokenCount: Combined total limit on tokens for both requests and responses within the time unit.

requestCount

Defines rate limits based on the number of requests.

  • requestsPerUnit: Limit on the number of requests that can be processed within the time unit.
  • unit: Time unit for the request limit.

targetRef

Specifies the backend to which the rate limit applies.

Example AIRatelimitPolicy CR:

yaml apiVersion: dp.wso2.com/v1alpha3 kind: AIRateLimitPolicy metadata: name: llm-backend-rl namespace: apk spec: override: organization: default tokenCount: unit: Minute requestTokenCount: 5000 responseTokenCount: 10000 totalTokenCount: 15000 requestCount: requestsPerUnit: 6000 unit: Minute targetRef: kind: Backend name: backend-33eb53282e93f5fd3f26935311af727d58bd42c3-api group: gateway.networking.k8s.io