@aws-cdk_aws-sagemaker-alpha.InvocationsScalingProps

interface InvocationsScalingProps ๐Ÿ”น

LanguageType name
.NETAmazon.CDK.AWS.Sagemaker.Alpha.InvocationsScalingProps
Gogithub.com/aws/aws-cdk-go/awscdksagemakeralpha/v2#InvocationsScalingProps
Javasoftware.amazon.awscdk.services.sagemaker.alpha.InvocationsScalingProps
Pythonaws_cdk.aws_sagemaker_alpha.InvocationsScalingProps
TypeScript (source)@aws-cdk/aws-sagemaker-alpha ยป InvocationsScalingProps

Properties for enabling SageMaker Endpoint utilization tracking.

Example

import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.Model;

const variantName = 'my-variant';
const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [
    {
      model: model,
      variantName: variantName,
    },
  ]
});

const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
const productionVariant = endpoint.findInstanceProductionVariant(variantName);
const instanceCount = productionVariant.autoScaleInstanceCount({
  maxCapacity: 3
});
instanceCount.scaleOnInvocations('LimitRPS', {
  maxRequestsPerSecond: 30,
});

Properties

NameTypeDescription
maxRequestsPerSecond๐Ÿ”นnumberMax RPS per instance used for calculating the target SageMaker variant invocation per instance.
disableScaleIn?๐Ÿ”นbooleanIndicates whether scale in by the target tracking policy is disabled.
policyName?๐Ÿ”นstringA name for the scaling policy.
safetyFactor?๐Ÿ”นnumberSafty factor for calculating the target SageMaker variant invocation per instance.
scaleInCooldown?๐Ÿ”นDurationPeriod after a scale in activity completes before another scale in activity can start.
scaleOutCooldown?๐Ÿ”นDurationPeriod after a scale out activity completes before another scale out activity can start.

maxRequestsPerSecond๐Ÿ”น

Type: number

Max RPS per instance used for calculating the target SageMaker variant invocation per instance.

More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html


disableScaleIn?๐Ÿ”น

Type: boolean (optional, default: false)

Indicates whether scale in by the target tracking policy is disabled.

If the value is true, scale in is disabled and the target tracking policy won't remove capacity from the scalable resource. Otherwise, scale in is enabled and the target tracking policy can remove capacity from the scalable resource.


policyName?๐Ÿ”น

Type: string (optional, default: Automatically generated name.)

A name for the scaling policy.


safetyFactor?๐Ÿ”น

Type: number (optional, default: 0.5)

Safty factor for calculating the target SageMaker variant invocation per instance.

More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html


scaleInCooldown?๐Ÿ”น

Type: Duration (optional, default: Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrency)

Period after a scale in activity completes before another scale in activity can start.


scaleOutCooldown?๐Ÿ”น

Type: Duration (optional, default: Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrency)

Period after a scale out activity completes before another scale out activity can start.