Skip to content

AI Rate Limit Policy

AIRateLimitPolicy

AIRateLimitPolicy is the Schema for the airatelimitpolicies API

Field Description
metadata
Kubernetes meta/v1.ObjectMeta
Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
AIRateLimitPolicySpec


override
AIRateLimit
default
AIRateLimit
targetRef
sigs.k8s.io/gateway-api/apis/v1alpha2.PolicyTargetReference
status
AIRateLimitPolicyStatus

AIRateLimitPolicySpec

(Appears on: AIRateLimitPolicy)

AIRateLimitPolicySpec defines the desired state of AIRateLimitPolicy

Field Description
override
AIRateLimit
default
AIRateLimit
targetRef
sigs.k8s.io/gateway-api/apis/v1alpha2.PolicyTargetReference

AIRateLimit

(Appears on: AIRateLimitPolicySpec)

AIRateLimit defines the AI ratelimit configuration

Field Description
organization
string
tokenCount
TokenCount
requestCount
RequestCount

TokenCount

(Appears on: AIRateLimit)

TokenCount defines the Token based ratelimit configuration

Field Description
unit
string

Unit is the unit of the requestsPerUnit

requestTokenCount
uint32

RequestTokenCount specifies the maximum number of tokens allowed in AI requests within a given unit of time. This value limits the token count sent by the client to the AI service over the defined period.

responseTokenCount
uint32

ResponseTokenCount specifies the maximum number of tokens allowed in AI responses within a given unit of time. This value limits the token count received by the client from the AI service over the defined period.

totalTokenCount
uint32

TotalTokenCount represents the maximum allowable total token count for both AI requests and responses within a specified unit of time. This value sets the limit for the number of tokens exchanged between the client and AI service during the defined period.

RequestCount

(Appears on: AIRateLimit)

TokenCount defines the Token based ratelimit configuration

Field Description
unit
string

Unit is the unit of the requestsPerUnit

requestsPerUnit
uint32

RequestsPerUnit specifies the maximum number of requests allowed within a given unit of time.

AIRateLimitPolicyStatus

(Appears on: AIRateLimitPolicy)

AIRateLimitPolicyStatus defines the observed state of AIRateLimitPolicy


Generated with gen-crd-api-reference-docs.

AIRatelimitPolicy Sample

The following is a sample CR for creating a AIRatelimitPolicy.

apiVersion: dp.wso2.com/v1alpha3
kind: AIRateLimitPolicy
metadata:
  name: llm-backend-rl
  namespace: apk-integration-test
spec:
  override:
    organization: default
    tokenCount:
      unit: Minute
      requestTokenCount: 5000
      responseTokenCount: 10000
      totalTokenCount: 15000
    requestCount:
      requestsPerUnit: 6000
      unit: Minute
  targetRef:
    kind: Backend
    name: llm-backend