AI Rate Limit Policy
AIRateLimitPolicy ¶
AIRateLimitPolicy is the Schema for the airatelimitpolicies API
Field | Description | ||||||
---|---|---|---|---|---|---|---|
metadata
Kubernetes meta/v1.ObjectMeta
|
Refer to the Kubernetes API documentation for the fields of the
metadata field.
|
||||||
spec
AIRateLimitPolicySpec
|
|
||||||
status
AIRateLimitPolicyStatus
|
AIRateLimitPolicySpec ¶
(Appears on: AIRateLimitPolicy)
AIRateLimitPolicySpec defines the desired state of AIRateLimitPolicy
Field | Description |
---|---|
override
AIRateLimit
|
|
default
AIRateLimit
|
|
targetRef
sigs.k8s.io/gateway-api/apis/v1alpha2.PolicyTargetReference
|
AIRateLimit ¶
(Appears on: AIRateLimitPolicySpec)
AIRateLimit defines the AI ratelimit configuration
Field | Description |
---|---|
organization
string
|
|
tokenCount
TokenCount
|
|
requestCount
RequestCount
|
TokenCount ¶
(Appears on: AIRateLimit)
TokenCount defines the Token based ratelimit configuration
Field | Description |
---|---|
unit
string
|
Unit is the unit of the requestsPerUnit |
requestTokenCount
uint32
|
RequestTokenCount specifies the maximum number of tokens allowed in AI requests within a given unit of time. This value limits the token count sent by the client to the AI service over the defined period. |
responseTokenCount
uint32
|
ResponseTokenCount specifies the maximum number of tokens allowed in AI responses within a given unit of time. This value limits the token count received by the client from the AI service over the defined period. |
totalTokenCount
uint32
|
TotalTokenCount represents the maximum allowable total token count for both AI requests and responses within a specified unit of time. This value sets the limit for the number of tokens exchanged between the client and AI service during the defined period. |
RequestCount ¶
(Appears on: AIRateLimit)
TokenCount defines the Token based ratelimit configuration
Field | Description |
---|---|
unit
string
|
Unit is the unit of the requestsPerUnit |
requestsPerUnit
uint32
|
RequestsPerUnit specifies the maximum number of requests allowed within a given unit of time. |
AIRateLimitPolicyStatus ¶
(Appears on: AIRateLimitPolicy)
AIRateLimitPolicyStatus defines the observed state of AIRateLimitPolicy
Generated with gen-crd-api-reference-docs
.
AIRatelimitPolicy Sample¶
The following is a sample CR for creating a AIRatelimitPolicy.
apiVersion: dp.wso2.com/v1alpha3
kind: AIRateLimitPolicy
metadata:
name: llm-backend-rl
namespace: apk-integration-test
spec:
override:
organization: default
tokenCount:
unit: Minute
requestTokenCount: 5000
responseTokenCount: 10000
totalTokenCount: 15000
requestCount:
requestsPerUnit: 6000
unit: Minute
targetRef:
kind: Backend
name: llm-backend