在亚马逊云科技上安全、合规地创建AI大模型训练基础设施并开发AI应用服务
CSDN 2024-09-06 09:01:01 阅读 54
项目简介:
小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践,并应用到自己的日常工作里。
本次介绍的是如何在亚马逊云科技利用Service Catalog服务创建和管理包含AI大模型的应用产品,并通过权限管理基于员工的身份职责限制所能访问的云资源,并创建SageMaker机器学习托管服务并在该服务上训练和部署大模型,通过VPC endpoint节点私密、安全的加载模型文件和模型容器镜像。本架构设计全部采用了云原生Serverless架构,提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下:
方案所需基础知识
什么是 Amazon SageMaker?
Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务,旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具,使用户能够高效地在云端实现机器学习项目。
什么是亚马逊云科技 Service Catalog?
亚马逊云科技 Service Catalog 是一项服务,旨在帮助企业创建、管理和分发经过批准的云服务集合。通过 Service Catalog,企业可以集中管理已批准的资源和配置,确保开发团队在使用云服务时遵循组织的最佳实践和合规要求。用户可以从预定义的产品目录中选择所需的服务,简化了资源部署的过程,并减少了因配置错误导致的风险。
利用 SageMaker 构建 AI 服务的安全合规好处
符合企业合规性要求:
使用 SageMaker 构建 AI 服务时,可以通过 Service Catalog 预先定义和管理符合公司合规标准的配置模板,确保所有的 AI 模型和资源部署都遵循组织的安全政策和行业法规,如 GDPR 或 HIPAA。
数据安全性:
SageMaker 提供了端到端的数据加密选项,包括在数据存储和传输中的加密,确保敏感数据在整个 AI 模型生命周期中的安全性。同时可以利用VPC endpoint节点,私密安全的访问S3中的数据,加载ECR镜像库中保存的AI模型镜像容器。
访问控制和监控:
通过与亚马逊云科技的身份和访问管理(IAM)集成,可以细粒度地控制谁可以访问和操作 SageMaker 中的资源。再结合 CloudTrail 和 CloudWatch 等监控工具,企业可以实时跟踪和审计所有的操作,确保透明度和安全性。
本方案包括的内容
1. 通过VPC Endpoint节点,私有访问S3中的模型文件
2. 创建亚马逊云科技Service Catalog资源组,统一创建、管理用户的云服务产品。
3. 作为Service Catalog的使用用户创建一个SageMaker机器学习训练计算实例
项目搭建具体步骤:
1. 登录亚马逊云科技控制台,进入无服务器计算服务Lambda,创建一个Lambda函数“SageMakerBuild”,复制以下代码,用于创建SageMaker Jupyter Notebook,训练AI大模型。
<code>import json
import boto3
import requests
import botocore
import time
import base64
## Request Status ##
global ReqStatus
def CFTFailedResponse(event, status, message):
print("Inside CFTFailedResponse")
responseBody = {
'Status': status,
'Reason': message,
'PhysicalResourceId': event['ServiceToken'],
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId']
}
headers={
'content-type':'',
'content-length':str(len(json.dumps(responseBody)))
}
print('Response = ' + json.dumps(responseBody))
try:
req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
print("delete_respond_cloudformation res "+str(req))
except Exception as e:
print("Failed to send cf response {}".format(e))
def CFTSuccessResponse(event, status, data=None):
responseBody = {
'Status': status,
'Reason': 'See the details in CloudWatch Log Stream',
'PhysicalResourceId': event['ServiceToken'],
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': data
}
headers={
'content-type':'',
'content-length':str(len(json.dumps(responseBody)))
}
print('Response = ' + json.dumps(responseBody))
#print(event)
try:
req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)
except Exception as e:
print("Failed to send cf response {}".format(e))
def lambda_handler(event, context):
ReqStatus = "SUCCESS"
print("Event:")
print(event)
client = boto3.client('sagemaker')
ec2client = boto3.client('ec2')
data = {}
if event['RequestType'] == 'Create':
try:
## Value Intialization from CFT ##
project_name = event['ResourceProperties']['ProjectName']
kmsKeyId = event['ResourceProperties']['KmsKeyId']
Tags = event['ResourceProperties']['Tags']
env_name = event['ResourceProperties']['ENVName']
subnet_name = event['ResourceProperties']['Subnet']
security_group_name = event['ResourceProperties']['SecurityGroupName']
input_dict = {}
input_dict['NotebookInstanceName'] = event['ResourceProperties']['NotebookInstanceName']
input_dict['InstanceType'] = event['ResourceProperties']['NotebookInstanceType']
input_dict['Tags'] = event['ResourceProperties']['Tags']
input_dict['DirectInternetAccess'] = event['ResourceProperties']['DirectInternetAccess']
input_dict['RootAccess'] = event['ResourceProperties']['RootAccess']
input_dict['VolumeSizeInGB'] = int(event['ResourceProperties']['VolumeSizeInGB'])
input_dict['RoleArn'] = event['ResourceProperties']['RoleArn']
input_dict['LifecycleConfigName'] = event['ResourceProperties']['LifecycleConfigName']
except Exception as e:
print(e)
ReqStatus = "FAILED"
message = "Parameter Error: "+str(e)
CFTFailedResponse(event, "FAILED", message)
if ReqStatus == "FAILED":
return None;
print("Validating Environment name: "+env_name)
print("Subnet Id Fetching.....")
try:
## Sagemaker Subnet ##
subnetName = env_name+"-ResourceSubnet"
print(subnetName)
response = ec2client.describe_subnets(
Filters=[
{
'Name': 'tag:Name',
'Values': [
subnet_name
]
},
]
)
#print(response)
subnetId = response['Subnets'][0]['SubnetId']
input_dict['SubnetId'] = subnetId
print("Desc sg done!!")
except Exception as e:
print(e)
ReqStatus = "FAILED"
message = " Project Name is invalid - Subnet Error: "+str(e)
CFTFailedResponse(event, "FAILED", message)
if ReqStatus == "FAILED":
return None;
## Sagemaker Security group ##
print("Security GroupId Fetching.....")
try:
sgName = env_name+"-ResourceSG"
response = ec2client.describe_security_groups(
Filters=[
{
'Name': 'tag:Name',
'Values': [
security_group_name
]
},
]
)
sgId = response['SecurityGroups'][0]['GroupId']
input_dict['SecurityGroupIds'] = [sgId]
print("Desc sg done!!")
except Exception as e:
print(e)
ReqStatus = "FAILED"
message = "Security Group ID Error: "+str(e)
CFTFailedResponse(event, "FAILED", message)
if ReqStatus == "FAILED":
return None;
try:
if kmsKeyId:
input_dict['KmsKeyId'] = kmsKeyId
else:
print("in else")
print(input_dict)
instance = client.create_notebook_instance(**input_dict)
print('Sagemager CLI response')
print(str(instance))
responseData = {'NotebookInstanceArn': instance['NotebookInstanceArn']}
NotebookStatus = 'Pending'
response = client.describe_notebook_instance(
NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
)
NotebookStatus = response['NotebookInstanceStatus']
print("NotebookStatus:"+NotebookStatus)
## Notebook Failure ##
if NotebookStatus == 'Failed':
message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
CFTFailedResponse(event, "FAILED", message)
else:
while NotebookStatus == 'Pending':
time.sleep(200)
response = client.describe_notebook_instance(
NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']
)
NotebookStatus = response['NotebookInstanceStatus']
print("NotebookStatus in loop:"+NotebookStatus)
## Notebook Success ##
if NotebookStatus == 'InService':
data['Message'] = "SageMaker Notebook name - "+event['ResourceProperties']['NotebookInstanceName']+" created succesfully"
print("message InService :",data['Message'])
CFTSuccessResponse(event, "SUCCESS", data)
else:
message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"
print("message :",message)
CFTFailedResponse(event, "FAILED", message)
except Exception as e:
print(e)
ReqStatus = "FAILED"
CFTFailedResponse(event, "FAILED", str(e))
if event['RequestType'] == 'Delete':
NotebookStatus = None
lifecycle_config = event['ResourceProperties']['LifecycleConfigName']
NotebookName = event['ResourceProperties']['NotebookInstanceName']
try:
response = client.describe_notebook_instance(
NotebookInstanceName=NotebookName
)
NotebookStatus = response['NotebookInstanceStatus']
print("Notebook Status - "+NotebookStatus)
except Exception as e:
print(e)
NotebookStatus = "Invalid"
#CFTFailedResponse(event, "FAILED", str(e))
while NotebookStatus == 'Pending':
time.sleep(30)
response = client.describe_notebook_instance(
NotebookInstanceName=NotebookName
)
NotebookStatus = response['NotebookInstanceStatus']
print("NotebookStatus:"+NotebookStatus)
if NotebookStatus != 'Failed' and NotebookStatus != 'Invalid' :
print("Delete request for Notebookk name: "+NotebookName)
print("Stoping the Notebook.....")
if NotebookStatus != 'Stopped':
try:
response = client.stop_notebook_instance(
NotebookInstanceName=NotebookName
)
NotebookStatus = 'Stopping'
print("Notebook Status - "+NotebookStatus)
while NotebookStatus == 'Stopping':
time.sleep(30)
response = client.describe_notebook_instance(
NotebookInstanceName=NotebookName
)
NotebookStatus = response['NotebookInstanceStatus']
print("NotebookStatus:"+NotebookStatus)
except Exception as e:
print(e)
NotebookStatus = "Invalid"
CFTFailedResponse(event, "FAILED", str(e))
else:
NotebookStatus = 'Stopped'
print("NotebookStatus:"+NotebookStatus)
if NotebookStatus != 'Invalid':
print("Deleting The Notebook......")
time.sleep(5)
try:
response = client.delete_notebook_instance(
NotebookInstanceName=NotebookName
)
print("Notebook Deleted")
data["Message"] = "Notebook Deleted"
CFTSuccessResponse(event, "SUCCESS", data)
except Exception as e:
print(e)
CFTFailedResponse(event, "FAILED", str(e))
else:
print("Notebook Invalid status")
data["Message"] = "Notebook is not available"
CFTSuccessResponse(event, "SUCCESS", data)
if event['RequestType'] == 'Update':
print("Update operation for Sagemaker Notebook is not recommended")
data["Message"] = "Update operation for Sagemaker Notebook is not recommended"
CFTSuccessResponse(event, "SUCCESS", data)
2. 接下来我们创建一个yaml脚本,复制以下代码,上传到S3桶中,用于通过CloudFormation,以IaC的形式创建SageMaker Jupyter Notebook。
AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a SageMaker notebook
Metadata:
'AWS::CloudFormation::Interface':
ParameterGroups:
- Label:
default: Environment detail
Parameters:
- ENVName
- Label:
default: SageMaker Notebook configuration
Parameters:
- NotebookInstanceName
- NotebookInstanceType
- DirectInternetAccess
- RootAccess
- VolumeSizeInGB
- Label:
default: Load S3 Bucket to SageMaker
Parameters:
- S3CodePusher
- CodeBucketName
- Label:
default: Project detail
Parameters:
- ProjectName
- ProjectID
ParameterLabels:
DirectInternetAccess:
default: Default Internet Access
NotebookInstanceName:
default: Notebook Instance Name
NotebookInstanceType:
default: Notebook Instance Type
ENVName:
default: Environment Name
ProjectName:
default: Project Suffix
RootAccess:
default: Root access
VolumeSizeInGB:
default: Volume size for the SageMaker Notebook
ProjectID:
default: SageMaker ProjectID
CodeBucketName:
default: Code Bucket Name
S3CodePusher:
default: Copy code from S3 to SageMaker
Parameters:
SubnetName:
Default: ProSM-ResourceSubnet
Description: Subnet Random String
Type: String
SecurityGroupName:
Default: ProSM-ResourceSG
Description: Security Group Name
Type: String
SageMakerBuildFunctionARN:
Description: Service Token Value passed from Lambda Stack
Type: String
NotebookInstanceName:
AllowedPattern: '[A-Za-z0-9-]{1,63}'
ConstraintDescription: >-
Maximum of 63 alphanumeric characters. Can include hyphens (-), but not
spaces. Must be unique within your account in an AWS Region.
Description: SageMaker Notebook instance name
MaxLength: '63'
MinLength: '1'
Type: String
NotebookInstanceType:
ConstraintDescription: Must select a valid notebook instance type.
Default: ml.t3.medium
Description: Select Instance type for the SageMaker Notebook
Type: String
ENVName:
Description: SageMaker infrastructure naming convention
Type: String
ProjectName:
Description: >-
The suffix appended to all resources in the stack. This will allow
multiple copies of the same stack to be created in the same account.
Type: String
RootAccess:
Description: Root access for the SageMaker Notebook user
AllowedValues:
- Enabled
- Disabled
Default: Enabled
Type: String
VolumeSizeInGB:
Description: >-
The size, in GB, of the ML storage volume to attach to the notebook
instance. The default value is 5 GB.
Type: Number
Default: '20'
DirectInternetAccess:
Description: >-
If you set this to Disabled this notebook instance will be able to access
resources only in your VPC. As per the Project requirement, we have
Disabled it.
Type: String
Default: Disabled
AllowedValues:
- Disabled
ConstraintDescription: Must select a valid notebook instance type.
ProjectID:
Type: String
Description: Enter a valid ProjectID.
Default: QuickStart007
S3CodePusher:
Description: Do you want to load the code from S3 to SageMaker Notebook
Default: 'NO'
AllowedValues:
- 'YES'
- 'NO'
Type: String
CodeBucketName:
Description: S3 Bucket name from which you want to copy the code to SageMaker.
Default: lab-materials-bucket-1234
Type: String
Conditions:
BucketCondition: !Equals
- 'YES'
- !Ref S3CodePusher
Resources:
SagemakerKMSKey:
Type: 'AWS::KMS::Key'
Properties:
EnableKeyRotation: true
Tags:
- Key: ProjectID
Value: !Ref ProjectID
- Key: ProjectName
Value: !Ref ProjectName
KeyPolicy:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'
Action:
- 'kms:Encrypt'
- 'kms:PutKeyPolicy'
- 'kms:CreateKey'
- 'kms:GetKeyRotationStatus'
- 'kms:DeleteImportedKeyMaterial'
- 'kms:GetKeyPolicy'
- 'kms:UpdateCustomKeyStore'
- 'kms:GenerateRandom'
- 'kms:UpdateAlias'
- 'kms:ImportKeyMaterial'
- 'kms:ListRetirableGrants'
- 'kms:CreateGrant'
- 'kms:DeleteAlias'
- 'kms:RetireGrant'
- 'kms:ScheduleKeyDeletion'
- 'kms:DisableKeyRotation'
- 'kms:TagResource'
- 'kms:CreateAlias'
- 'kms:EnableKeyRotation'
- 'kms:DisableKey'
- 'kms:ListResourceTags'
- 'kms:Verify'
- 'kms:DeleteCustomKeyStore'
- 'kms:Sign'
- 'kms:ListKeys'
- 'kms:ListGrants'
- 'kms:ListAliases'
- 'kms:ReEncryptTo'
- 'kms:UntagResource'
- 'kms:GetParametersForImport'
- 'kms:ListKeyPolicies'
- 'kms:GenerateDataKeyPair'
- 'kms:GenerateDataKeyPairWithoutPlaintext'
- 'kms:GetPublicKey'
- 'kms:Decrypt'
- 'kms:ReEncryptFrom'
- 'kms:DisconnectCustomKeyStore'
- 'kms:DescribeKey'
- 'kms:GenerateDataKeyWithoutPlaintext'
- 'kms:DescribeCustomKeyStores'
- 'kms:CreateCustomKeyStore'
- 'kms:EnableKey'
- 'kms:RevokeGrant'
- 'kms:UpdateKeyDescription'
- 'kms:ConnectCustomKeyStore'
- 'kms:CancelKeyDeletion'
- 'kms:GenerateDataKey'
Resource:
- !Join
- ''
- - 'arn:aws:kms:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':key/*'
- Sid: Allow access for Key Administrators
Effect: Allow
Principal:
AWS:
- !GetAtt SageMakerExecutionRole.Arn
Action:
- 'kms:CreateAlias'
- 'kms:CreateKey'
- 'kms:CreateGrant'
- 'kms:CreateCustomKeyStore'
- 'kms:DescribeKey'
- 'kms:DescribeCustomKeyStores'
- 'kms:EnableKey'
- 'kms:EnableKeyRotation'
- 'kms:ListKeys'
- 'kms:ListAliases'
- 'kms:ListKeyPolicies'
- 'kms:ListGrants'
- 'kms:ListRetirableGrants'
- 'kms:ListResourceTags'
- 'kms:PutKeyPolicy'
- 'kms:UpdateAlias'
- 'kms:UpdateKeyDescription'
- 'kms:UpdateCustomKeyStore'
- 'kms:RevokeGrant'
- 'kms:DisableKey'
- 'kms:DisableKeyRotation'
- 'kms:GetPublicKey'
- 'kms:GetKeyRotationStatus'
- 'kms:GetKeyPolicy'
- 'kms:GetParametersForImport'
- 'kms:DeleteCustomKeyStore'
- 'kms:DeleteImportedKeyMaterial'
- 'kms:DeleteAlias'
- 'kms:TagResource'
- 'kms:UntagResource'
- 'kms:ScheduleKeyDeletion'
- 'kms:CancelKeyDeletion'
Resource:
- !Join
- ''
- - 'arn:aws:kms:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':key/*'
- Sid: Allow use of the key
Effect: Allow
Principal:
AWS:
- !GetAtt SageMakerExecutionRole.Arn
Action:
- kms:Encrypt
- kms:Decrypt
- kms:ReEncryptTo
- kms:ReEncryptFrom
- kms:GenerateDataKeyPair
- kms:GenerateDataKeyPairWithoutPlaintext
- kms:GenerateDataKeyWithoutPlaintext
- kms:GenerateDataKey
- kms:DescribeKey
Resource:
- !Join
- ''
- - 'arn:aws:kms:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':key/*'
- Sid: Allow attachment of persistent resources
Effect: Allow
Principal:
AWS:
- !GetAtt SageMakerExecutionRole.Arn
Action:
- kms:CreateGrant
- kms:ListGrants
- kms:RevokeGrant
Resource:
- !Join
- ''
- - 'arn:aws:kms:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':key/*'
Condition:
Bool:
kms:GrantIsForAWSResource: 'true'
KeyAlias:
Type: AWS::KMS::Alias
Properties:
AliasName: 'alias/SageMaker-CMK-DS'
TargetKeyId:
Ref: SagemakerKMSKey
SageMakerExecutionRole:
Type: 'AWS::IAM::Role'
Properties:
Tags:
- Key: ProjectID
Value: !Ref ProjectID
- Key: ProjectName
Value: !Ref ProjectName
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
- sagemaker.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: !Join
- ''
- - !Ref ProjectName
- SageMakerExecutionPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- 'iam:ListRoles'
Resource:
- !Join
- ''
- - 'arn:aws:iam::'
- !Ref 'AWS::AccountId'
- ':role/*'
- Sid: CloudArnResource
Effect: Allow
Action:
- 'application-autoscaling:DeleteScalingPolicy'
- 'application-autoscaling:DeleteScheduledAction'
- 'application-autoscaling:DeregisterScalableTarget'
- 'application-autoscaling:DescribeScalableTargets'
- 'application-autoscaling:DescribeScalingActivities'
- 'application-autoscaling:DescribeScalingPolicies'
- 'application-autoscaling:DescribeScheduledActions'
- 'application-autoscaling:PutScalingPolicy'
- 'application-autoscaling:PutScheduledAction'
- 'application-autoscaling:RegisterScalableTarget'
Resource:
- !Join
- ''
- - 'arn:aws:autoscaling:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':*'
- Sid: ElasticArnResource
Effect: Allow
Action:
- 'elastic-inference:Connect'
Resource:
- !Join
- ''
- - 'arn:aws:elastic-inference:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':elastic-inference-accelerator/*'
- Sid: SNSArnResource
Effect: Allow
Action:
- 'sns:ListTopics'
Resource:
- !Join
- ''
- - 'arn:aws:sns:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':*'
- Sid: logsArnResource
Effect: Allow
Action:
- 'cloudwatch:DeleteAlarms'
- 'cloudwatch:DescribeAlarms'
- 'cloudwatch:GetMetricData'
- 'cloudwatch:GetMetricStatistics'
- 'cloudwatch:ListMetrics'
- 'cloudwatch:PutMetricAlarm'
- 'cloudwatch:PutMetricData'
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:DescribeLogStreams'
- 'logs:GetLogEvents'
- 'logs:PutLogEvents'
Resource:
- !Join
- ''
- - 'arn:aws:logs:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':log-group:/aws/lambda/*'
- Sid: KmsArnResource
Effect: Allow
Action:
- 'kms:DescribeKey'
- 'kms:ListAliases'
Resource:
- !Join
- ''
- - 'arn:aws:kms:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':key/*'
- Sid: ECRArnResource
Effect: Allow
Action:
- 'ecr:BatchCheckLayerAvailability'
- 'ecr:BatchGetImage'
- 'ecr:CreateRepository'
- 'ecr:GetAuthorizationToken'
- 'ecr:GetDownloadUrlForLayer'
- 'ecr:DescribeRepositories'
- 'ecr:DescribeImageScanFindings'
- 'ecr:DescribeRegistry'
- 'ecr:DescribeImages'
Resource:
- !Join
- ''
- - 'arn:aws:ecr:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':repository/*'
- Sid: EC2ArnResource
Effect: Allow
Action:
- 'ec2:CreateNetworkInterface'
- 'ec2:CreateNetworkInterfacePermission'
- 'ec2:DeleteNetworkInterface'
- 'ec2:DeleteNetworkInterfacePermission'
- 'ec2:DescribeDhcpOptions'
- 'ec2:DescribeNetworkInterfaces'
- 'ec2:DescribeRouteTables'
- 'ec2:DescribeSecurityGroups'
- 'ec2:DescribeSubnets'
- 'ec2:DescribeVpcEndpoints'
- 'ec2:DescribeVpcs'
Resource:
- !Join
- ''
- - 'arn:aws:ec2:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':instance/*'
- Sid: S3ArnResource
Effect: Allow
Action:
- 's3:CreateBucket'
- 's3:GetBucketLocation'
- 's3:ListBucket'
Resource:
- !Join
- ''
- - 'arn:aws:s3::'
- ':*sagemaker*'
- Sid: LambdaInvokePermission
Effect: Allow
Action:
- 'lambda:ListFunctions'
Resource:
- !Join
- ''
- - 'arn:aws:lambda:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':function'
- ':*'
- Effect: Allow
Action: 'sagemaker:InvokeEndpoint'
Resource:
- !Join
- ''
- - 'arn:aws:sagemaker:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':notebook-instance-lifecycle-config/*'
Condition:
StringEquals:
'aws:PrincipalTag/ProjectID': !Ref ProjectID
- Effect: Allow
Action:
- 'sagemaker:CreateTrainingJob'
- 'sagemaker:CreateEndpoint'
- 'sagemaker:CreateModel'
- 'sagemaker:CreateEndpointConfig'
- 'sagemaker:CreateHyperParameterTuningJob'
- 'sagemaker:CreateTransformJob'
Resource:
- !Join
- ''
- - 'arn:aws:sagemaker:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':notebook-instance-lifecycle-config/*'
Condition:
StringEquals:
'aws:PrincipalTag/ProjectID': !Ref ProjectID
'ForAllValues:StringEquals':
'aws:TagKeys':
- Username
- Effect: Allow
Action:
- 'sagemaker:DescribeTrainingJob'
- 'sagemaker:DescribeEndpoint'
- 'sagemaker:DescribeEndpointConfig'
Resource:
- !Join
- ''
- - 'arn:aws:sagemaker:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':notebook-instance-lifecycle-config/*'
Condition:
StringEquals:
'aws:PrincipalTag/ProjectID': !Ref ProjectID
- Effect: Allow
Action:
- 'sagemaker:DeleteTags'
- 'sagemaker:ListTags'
- 'sagemaker:DescribeNotebookInstance'
- 'sagemaker:ListNotebookInstanceLifecycleConfigs'
- 'sagemaker:DescribeModel'
- 'sagemaker:ListTrainingJobs'
- 'sagemaker:DescribeHyperParameterTuningJob'
- 'sagemaker:UpdateEndpointWeightsAndCapacities'
- 'sagemaker:ListHyperParameterTuningJobs'
- 'sagemaker:ListEndpointConfigs'
- 'sagemaker:DescribeNotebookInstanceLifecycleConfig'
- 'sagemaker:ListTrainingJobsForHyperParameterTuningJob'
- 'sagemaker:StopHyperParameterTuningJob'
- 'sagemaker:DescribeEndpointConfig'
- 'sagemaker:ListModels'
- 'sagemaker:AddTags'
- 'sagemaker:ListNotebookInstances'
- 'sagemaker:StopTrainingJob'
- 'sagemaker:ListEndpoints'
- 'sagemaker:DeleteEndpoint'
Resource:
- !Join
- ''
- - 'arn:aws:sagemaker:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':notebook-instance-lifecycle-config/*'
Condition:
StringEquals:
'aws:PrincipalTag/ProjectID': !Ref ProjectID
- Effect: Allow
Action:
- 'ecr:SetRepositoryPolicy'
- 'ecr:CompleteLayerUpload'
- 'ecr:BatchDeleteImage'
- 'ecr:UploadLayerPart'
- 'ecr:DeleteRepositoryPolicy'
- 'ecr:InitiateLayerUpload'
- 'ecr:DeleteRepository'
- 'ecr:PutImage'
Resource:
- !Join
- ''
- - 'arn:aws:ecr:'
- !Ref 'AWS::Region'
- ':'
- !Ref 'AWS::AccountId'
- ':repository/*sagemaker*'
- Effect: Allow
Action:
- 's3:GetObject'
- 's3:ListBucket'
- 's3:PutObject'
- 's3:DeleteObject'
Resource:
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref SagemakerS3Bucket
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref SagemakerS3Bucket
- /*
Condition:
StringEquals:
'aws:PrincipalTag/ProjectID': !Ref ProjectID
- Effect: Allow
Action: 'iam:PassRole'
Resource:
- !Join
- ''
- - 'arn:aws:iam::'
- !Ref 'AWS::AccountId'
- ':role/*'
Condition:
StringEquals:
'iam:PassedToService': sagemaker.amazonaws.com
CodeBucketPolicy:
Type: 'AWS::IAM::Policy'
Condition: BucketCondition
Properties:
PolicyName: !Join
- ''
- - !Ref ProjectName
- CodeBucketPolicy
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- 's3:GetObject'
Resource:
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref CodeBucketName
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref CodeBucketName
- '/*'
Roles:
- !Ref SageMakerExecutionRole
SagemakerS3Bucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
Tags:
- Key: ProjectID
Value: !Ref ProjectID
- Key: ProjectName
Value: !Ref ProjectName
S3Policy:
Type: 'AWS::S3::BucketPolicy'
Properties:
Bucket: !Ref SagemakerS3Bucket
PolicyDocument:
Version: 2012-10-17
Statement:
- Sid: AllowAccessFromVPCEndpoint
Effect: Allow
Principal: "*"
Action:
- 's3:Get*'
- 's3:Put*'
- 's3:List*'
- 's3:DeleteObject'
Resource:
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref SagemakerS3Bucket
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref SagemakerS3Bucket
- '/*'
Condition:
StringEquals:
"aws:sourceVpce": "<PASTE S3 VPC ENDPOINT ID>"
EFSLifecycleConfig:
Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
Properties:
NotebookInstanceLifecycleConfigName: 'Provisioned-LC'
OnCreate:
- Content: !Base64
'Fn::Join':
- ''
- - |
#!/bin/bash
- |
aws configure set sts_regional_endpoints regional
- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
OnStart:
- Content: !Base64
'Fn::Join':
- ''
- - |
#!/bin/bash
- |
aws configure set sts_regional_endpoints regional
- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
EFSLifecycleConfigForS3:
Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'
Properties:
NotebookInstanceLifecycleConfigName: 'Provisioned-LC-S3'
OnCreate:
- Content: !Base64
'Fn::Join':
- ''
- - |
#!/bin/bash
- |
# Copy Content
- !Sub >
aws s3 cp s3://${CodeBucketName} /home/ec2-user/SageMaker/ --recursive
- |
# Set sts endpoint
- >
aws configure set sts_regional_endpoints regional
- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
OnStart:
- Content: !Base64
'Fn::Join':
- ''
- - |
#!/bin/bash
- |
aws configure set sts_regional_endpoints regional
- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config
SageMakerCustomResource:
Type: 'Custom::SageMakerCustomResource'
DependsOn: S3Policy
Properties:
ServiceToken: !Ref SageMakerBuildFunctionARN
NotebookInstanceName: !Ref NotebookInstanceName
NotebookInstanceType: !Ref NotebookInstanceType
KmsKeyId: !Ref SagemakerKMSKey
ENVName: !Join
- ''
- - !Ref ENVName
- !Sub Subnet1Id
Subnet: !Ref SubnetName
SecurityGroupName: !Ref SecurityGroupName
ProjectName: !Ref ProjectName
RootAccess: !Ref RootAccess
VolumeSizeInGB: !Ref VolumeSizeInGB
LifecycleConfigName: !If [BucketCondition, !GetAtt EFSLifecycleConfigForS3.NotebookInstanceLifecycleConfigName, !GetAtt EFSLifecycleConfig.NotebookInstanceLifecycleConfigName]
DirectInternetAccess: !Ref DirectInternetAccess
RoleArn: !GetAtt
- SageMakerExecutionRole
- Arn
Tags:
- Key: ProjectID
Value: !Ref ProjectID
- Key: ProjectName
Value: !Ref ProjectName
Outputs:
Message:
Description: Execution Status
Value: !GetAtt
- SageMakerCustomResource
- Message
SagemakerKMSKey:
Description: KMS Key for encrypting Sagemaker resource
Value: !Ref KeyAlias
ExecutionRoleArn:
Description: ARN of the Sagemaker Execution Role
Value: !Ref SageMakerExecutionRole
S3BucketName:
Description: S3 bucket for SageMaker Notebook operation
Value: !Ref SagemakerS3Bucket
NotebookInstanceName:
Description: Name of the Sagemaker Notebook instance created
Value: !Ref NotebookInstanceName
ProjectName:
Description: Project ID used for SageMaker deployment
Value: !Ref ProjectName
ProjectID:
Description: Project ID used for SageMaker deployment
Value: !Ref ProjectID
3. 接下来我们进入VPC服务主页,进入Endpoint功能,点击Create endpoint创建一个VPC endpoint节点,用于SageMaker私密安全的访问S3桶中的大模型文件。
4. 为节点命名为“s3-endpoint”,并选择节点访问对象类型为AWS service,选择s3作为访问服务。
5. 选择节点所在的VPC,并配置路由表,最后点击创建。
6. 接下来我们进入亚马逊云科技service catalog服务主页,进入Portfolio功能,点击create创建一个新的portfolio,用于统一管理一整个包括不同云资源的服务。
7. 为service portfolio起名“SageMakerPortfolio“,所有者选为CQ。
8. 接下来我们为Portfolio添加云资源,点击"create product"
9. 我们选择通过CloudFormation IaC脚本的形式创建Product云资源,为Product其名为”SageMakerProduct“,所有者设置为CQ。
10. 在Product中添加CloudFormation脚本文件,我们通过URL的形式,将我们在第二步上传到S3中的CloudFormation脚本URL填入,并设置版本为1,最后点击Create创建Product云资源。
11.接下来我们进入到Constraints页面,点击create创建Constraints,用于通过权限管理限制利用Service Catalog Product对云资源的操作。
12. 选择限制我们刚刚创建的的Product: "SageMakerProduct",选择限制的类型为创建。
13. 为限制添加IAM角色规则,IAM角色中配置了对Product权限管理规则,再点击Create创建。
14. 接下来我们点击Access,创建一个Access来限制可以访问Product云资源的用户。
15. 我们添加了角色”SCEndUserRole“,用户代替用户访问Product创建云资源。
16. 接下来我们开始利用Service Catalog Product创建一些列的云资源。选中我们刚创建的Product,点击Launch
17. 为我们要创建的云资源Product起一个名字”DataScientistProduct“, 选择我们前一步创建的版本号1。
18. 为将要通过Product创建的SageMaker配置参数,环境名以及实例名
19. 添加我们在最开始创建的Lambda函数ARN ID,点击Launch开始创建。
20. 最后回到SageMaker服务主页,可以看到我们利用Service Catalog Product功能成功创建了一个新的Jupyter Notebook实例。利用这个实例,我们就可以开发我们的AI服务应用。
以上就是在亚马逊云科技上利用亚马逊云科技安全、合规地训练AI大模型和开发AI应用全部步骤。欢迎大家未来与我一起,未来获取更多国际前沿的生成式AI开发方案。
声明
本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。