在亚马逊云科技上安全、合规地创建AI大模型训练基础设施并开发AI应用服务

CSDN 2024-09-06 09:01:01 阅读 54

项目简介:

小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践,并应用到自己的日常工作里。

本次介绍的是如何在亚马逊云科技利用Service Catalog服务创建和管理包含AI大模型的应用产品,并通过权限管理基于员工的身份职责限制所能访问的云资源,并创建SageMaker机器学习托管服务并在该服务上训练和部署大模型,通过VPC endpoint节点私密、安全的加载模型文件和模型容器镜像。本架构设计全部采用了云原生Serverless架构,提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下:

方案所需基础知识 

什么是 Amazon SageMaker?

Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务,旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具,使用户能够高效地在云端实现机器学习项目。

什么是亚马逊云科技 Service Catalog?

亚马逊云科技 Service Catalog 是一项服务,旨在帮助企业创建、管理和分发经过批准的云服务集合。通过 Service Catalog,企业可以集中管理已批准的资源和配置,确保开发团队在使用云服务时遵循组织的最佳实践和合规要求。用户可以从预定义的产品目录中选择所需的服务,简化了资源部署的过程,并减少了因配置错误导致的风险。

利用 SageMaker 构建 AI 服务的安全合规好处

符合企业合规性要求

使用 SageMaker 构建 AI 服务时,可以通过 Service Catalog 预先定义和管理符合公司合规标准的配置模板,确保所有的 AI 模型和资源部署都遵循组织的安全政策和行业法规,如 GDPR 或 HIPAA。

数据安全性

SageMaker 提供了端到端的数据加密选项,包括在数据存储和传输中的加密,确保敏感数据在整个 AI 模型生命周期中的安全性。同时可以利用VPC endpoint节点,私密安全的访问S3中的数据,加载ECR镜像库中保存的AI模型镜像容器。

访问控制和监控

通过与亚马逊云科技的身份和访问管理(IAM)集成,可以细粒度地控制谁可以访问和操作 SageMaker 中的资源。再结合 CloudTrail 和 CloudWatch 等监控工具,企业可以实时跟踪和审计所有的操作,确保透明度和安全性。

本方案包括的内容

1. 通过VPC Endpoint节点,私有访问S3中的模型文件

2. 创建亚马逊云科技Service Catalog资源组,统一创建、管理用户的云服务产品。

3. 作为Service Catalog的使用用户创建一个SageMaker机器学习训练计算实例

项目搭建具体步骤:

1. 登录亚马逊云科技控制台,进入无服务器计算服务Lambda,创建一个Lambda函数“SageMakerBuild”,复制以下代码,用于创建SageMaker Jupyter Notebook,训练AI大模型。

<code>import json

import boto3

import requests

import botocore

import time

import base64

## Request Status ##

global ReqStatus

def CFTFailedResponse(event, status, message):

print("Inside CFTFailedResponse")

responseBody = {

'Status': status,

'Reason': message,

'PhysicalResourceId': event['ServiceToken'],

'StackId': event['StackId'],

'RequestId': event['RequestId'],

'LogicalResourceId': event['LogicalResourceId']

}

headers={

'content-type':'',

'content-length':str(len(json.dumps(responseBody)))

}

print('Response = ' + json.dumps(responseBody))

try:

req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)

print("delete_respond_cloudformation res "+str(req))

except Exception as e:

print("Failed to send cf response {}".format(e))

def CFTSuccessResponse(event, status, data=None):

responseBody = {

'Status': status,

'Reason': 'See the details in CloudWatch Log Stream',

'PhysicalResourceId': event['ServiceToken'],

'StackId': event['StackId'],

'RequestId': event['RequestId'],

'LogicalResourceId': event['LogicalResourceId'],

'Data': data

}

headers={

'content-type':'',

'content-length':str(len(json.dumps(responseBody)))

}

print('Response = ' + json.dumps(responseBody))

#print(event)

try:

req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)

except Exception as e:

print("Failed to send cf response {}".format(e))

def lambda_handler(event, context):

ReqStatus = "SUCCESS"

print("Event:")

print(event)

client = boto3.client('sagemaker')

ec2client = boto3.client('ec2')

data = {}

if event['RequestType'] == 'Create':

try:

## Value Intialization from CFT ##

project_name = event['ResourceProperties']['ProjectName']

kmsKeyId = event['ResourceProperties']['KmsKeyId']

Tags = event['ResourceProperties']['Tags']

env_name = event['ResourceProperties']['ENVName']

subnet_name = event['ResourceProperties']['Subnet']

security_group_name = event['ResourceProperties']['SecurityGroupName']

input_dict = {}

input_dict['NotebookInstanceName'] = event['ResourceProperties']['NotebookInstanceName']

input_dict['InstanceType'] = event['ResourceProperties']['NotebookInstanceType']

input_dict['Tags'] = event['ResourceProperties']['Tags']

input_dict['DirectInternetAccess'] = event['ResourceProperties']['DirectInternetAccess']

input_dict['RootAccess'] = event['ResourceProperties']['RootAccess']

input_dict['VolumeSizeInGB'] = int(event['ResourceProperties']['VolumeSizeInGB'])

input_dict['RoleArn'] = event['ResourceProperties']['RoleArn']

input_dict['LifecycleConfigName'] = event['ResourceProperties']['LifecycleConfigName']

except Exception as e:

print(e)

ReqStatus = "FAILED"

message = "Parameter Error: "+str(e)

CFTFailedResponse(event, "FAILED", message)

if ReqStatus == "FAILED":

return None;

print("Validating Environment name: "+env_name)

print("Subnet Id Fetching.....")

try:

## Sagemaker Subnet ##

subnetName = env_name+"-ResourceSubnet"

print(subnetName)

response = ec2client.describe_subnets(

Filters=[

{

'Name': 'tag:Name',

'Values': [

subnet_name

]

},

]

)

#print(response)

subnetId = response['Subnets'][0]['SubnetId']

input_dict['SubnetId'] = subnetId

print("Desc sg done!!")

except Exception as e:

print(e)

ReqStatus = "FAILED"

message = " Project Name is invalid - Subnet Error: "+str(e)

CFTFailedResponse(event, "FAILED", message)

if ReqStatus == "FAILED":

return None;

## Sagemaker Security group ##

print("Security GroupId Fetching.....")

try:

sgName = env_name+"-ResourceSG"

response = ec2client.describe_security_groups(

Filters=[

{

'Name': 'tag:Name',

'Values': [

security_group_name

]

},

]

)

sgId = response['SecurityGroups'][0]['GroupId']

input_dict['SecurityGroupIds'] = [sgId]

print("Desc sg done!!")

except Exception as e:

print(e)

ReqStatus = "FAILED"

message = "Security Group ID Error: "+str(e)

CFTFailedResponse(event, "FAILED", message)

if ReqStatus == "FAILED":

return None;

try:

if kmsKeyId:

input_dict['KmsKeyId'] = kmsKeyId

else:

print("in else")

print(input_dict)

instance = client.create_notebook_instance(**input_dict)

print('Sagemager CLI response')

print(str(instance))

responseData = {'NotebookInstanceArn': instance['NotebookInstanceArn']}

NotebookStatus = 'Pending'

response = client.describe_notebook_instance(

NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']

)

NotebookStatus = response['NotebookInstanceStatus']

print("NotebookStatus:"+NotebookStatus)

## Notebook Failure ##

if NotebookStatus == 'Failed':

message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"

CFTFailedResponse(event, "FAILED", message)

else:

while NotebookStatus == 'Pending':

time.sleep(200)

response = client.describe_notebook_instance(

NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName']

)

NotebookStatus = response['NotebookInstanceStatus']

print("NotebookStatus in loop:"+NotebookStatus)

## Notebook Success ##

if NotebookStatus == 'InService':

data['Message'] = "SageMaker Notebook name - "+event['ResourceProperties']['NotebookInstanceName']+" created succesfully"

print("message InService :",data['Message'])

CFTSuccessResponse(event, "SUCCESS", data)

else:

message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"

print("message :",message)

CFTFailedResponse(event, "FAILED", message)

except Exception as e:

print(e)

ReqStatus = "FAILED"

CFTFailedResponse(event, "FAILED", str(e))

if event['RequestType'] == 'Delete':

NotebookStatus = None

lifecycle_config = event['ResourceProperties']['LifecycleConfigName']

NotebookName = event['ResourceProperties']['NotebookInstanceName']

try:

response = client.describe_notebook_instance(

NotebookInstanceName=NotebookName

)

NotebookStatus = response['NotebookInstanceStatus']

print("Notebook Status - "+NotebookStatus)

except Exception as e:

print(e)

NotebookStatus = "Invalid"

#CFTFailedResponse(event, "FAILED", str(e))

while NotebookStatus == 'Pending':

time.sleep(30)

response = client.describe_notebook_instance(

NotebookInstanceName=NotebookName

)

NotebookStatus = response['NotebookInstanceStatus']

print("NotebookStatus:"+NotebookStatus)

if NotebookStatus != 'Failed' and NotebookStatus != 'Invalid' :

print("Delete request for Notebookk name: "+NotebookName)

print("Stoping the Notebook.....")

if NotebookStatus != 'Stopped':

try:

response = client.stop_notebook_instance(

NotebookInstanceName=NotebookName

)

NotebookStatus = 'Stopping'

print("Notebook Status - "+NotebookStatus)

while NotebookStatus == 'Stopping':

time.sleep(30)

response = client.describe_notebook_instance(

NotebookInstanceName=NotebookName

)

NotebookStatus = response['NotebookInstanceStatus']

print("NotebookStatus:"+NotebookStatus)

except Exception as e:

print(e)

NotebookStatus = "Invalid"

CFTFailedResponse(event, "FAILED", str(e))

else:

NotebookStatus = 'Stopped'

print("NotebookStatus:"+NotebookStatus)

if NotebookStatus != 'Invalid':

print("Deleting The Notebook......")

time.sleep(5)

try:

response = client.delete_notebook_instance(

NotebookInstanceName=NotebookName

)

print("Notebook Deleted")

data["Message"] = "Notebook Deleted"

CFTSuccessResponse(event, "SUCCESS", data)

except Exception as e:

print(e)

CFTFailedResponse(event, "FAILED", str(e))

else:

print("Notebook Invalid status")

data["Message"] = "Notebook is not available"

CFTSuccessResponse(event, "SUCCESS", data)

if event['RequestType'] == 'Update':

print("Update operation for Sagemaker Notebook is not recommended")

data["Message"] = "Update operation for Sagemaker Notebook is not recommended"

CFTSuccessResponse(event, "SUCCESS", data)

2. 接下来我们创建一个yaml脚本,复制以下代码,上传到S3桶中,用于通过CloudFormation,以IaC的形式创建SageMaker Jupyter Notebook。

AWSTemplateFormatVersion: 2010-09-09

Description: Template to create a SageMaker notebook

Metadata:

'AWS::CloudFormation::Interface':

ParameterGroups:

- Label:

default: Environment detail

Parameters:

- ENVName

- Label:

default: SageMaker Notebook configuration

Parameters:

- NotebookInstanceName

- NotebookInstanceType

- DirectInternetAccess

- RootAccess

- VolumeSizeInGB

- Label:

default: Load S3 Bucket to SageMaker

Parameters:

- S3CodePusher

- CodeBucketName

- Label:

default: Project detail

Parameters:

- ProjectName

- ProjectID

ParameterLabels:

DirectInternetAccess:

default: Default Internet Access

NotebookInstanceName:

default: Notebook Instance Name

NotebookInstanceType:

default: Notebook Instance Type

ENVName:

default: Environment Name

ProjectName:

default: Project Suffix

RootAccess:

default: Root access

VolumeSizeInGB:

default: Volume size for the SageMaker Notebook

ProjectID:

default: SageMaker ProjectID

CodeBucketName:

default: Code Bucket Name

S3CodePusher:

default: Copy code from S3 to SageMaker

Parameters:

SubnetName:

Default: ProSM-ResourceSubnet

Description: Subnet Random String

Type: String

SecurityGroupName:

Default: ProSM-ResourceSG

Description: Security Group Name

Type: String

SageMakerBuildFunctionARN:

Description: Service Token Value passed from Lambda Stack

Type: String

NotebookInstanceName:

AllowedPattern: '[A-Za-z0-9-]{1,63}'

ConstraintDescription: >-

Maximum of 63 alphanumeric characters. Can include hyphens (-), but not

spaces. Must be unique within your account in an AWS Region.

Description: SageMaker Notebook instance name

MaxLength: '63'

MinLength: '1'

Type: String

NotebookInstanceType:

ConstraintDescription: Must select a valid notebook instance type.

Default: ml.t3.medium

Description: Select Instance type for the SageMaker Notebook

Type: String

ENVName:

Description: SageMaker infrastructure naming convention

Type: String

ProjectName:

Description: >-

The suffix appended to all resources in the stack. This will allow

multiple copies of the same stack to be created in the same account.

Type: String

RootAccess:

Description: Root access for the SageMaker Notebook user

AllowedValues:

- Enabled

- Disabled

Default: Enabled

Type: String

VolumeSizeInGB:

Description: >-

The size, in GB, of the ML storage volume to attach to the notebook

instance. The default value is 5 GB.

Type: Number

Default: '20'

DirectInternetAccess:

Description: >-

If you set this to Disabled this notebook instance will be able to access

resources only in your VPC. As per the Project requirement, we have

Disabled it.

Type: String

Default: Disabled

AllowedValues:

- Disabled

ConstraintDescription: Must select a valid notebook instance type.

ProjectID:

Type: String

Description: Enter a valid ProjectID.

Default: QuickStart007

S3CodePusher:

Description: Do you want to load the code from S3 to SageMaker Notebook

Default: 'NO'

AllowedValues:

- 'YES'

- 'NO'

Type: String

CodeBucketName:

Description: S3 Bucket name from which you want to copy the code to SageMaker.

Default: lab-materials-bucket-1234

Type: String

Conditions:

BucketCondition: !Equals

- 'YES'

- !Ref S3CodePusher

Resources:

SagemakerKMSKey:

Type: 'AWS::KMS::Key'

Properties:

EnableKeyRotation: true

Tags:

- Key: ProjectID

Value: !Ref ProjectID

- Key: ProjectName

Value: !Ref ProjectName

KeyPolicy:

Version: '2012-10-17'

Statement:

- Effect: Allow

Principal:

AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'

Action:

- 'kms:Encrypt'

- 'kms:PutKeyPolicy'

- 'kms:CreateKey'

- 'kms:GetKeyRotationStatus'

- 'kms:DeleteImportedKeyMaterial'

- 'kms:GetKeyPolicy'

- 'kms:UpdateCustomKeyStore'

- 'kms:GenerateRandom'

- 'kms:UpdateAlias'

- 'kms:ImportKeyMaterial'

- 'kms:ListRetirableGrants'

- 'kms:CreateGrant'

- 'kms:DeleteAlias'

- 'kms:RetireGrant'

- 'kms:ScheduleKeyDeletion'

- 'kms:DisableKeyRotation'

- 'kms:TagResource'

- 'kms:CreateAlias'

- 'kms:EnableKeyRotation'

- 'kms:DisableKey'

- 'kms:ListResourceTags'

- 'kms:Verify'

- 'kms:DeleteCustomKeyStore'

- 'kms:Sign'

- 'kms:ListKeys'

- 'kms:ListGrants'

- 'kms:ListAliases'

- 'kms:ReEncryptTo'

- 'kms:UntagResource'

- 'kms:GetParametersForImport'

- 'kms:ListKeyPolicies'

- 'kms:GenerateDataKeyPair'

- 'kms:GenerateDataKeyPairWithoutPlaintext'

- 'kms:GetPublicKey'

- 'kms:Decrypt'

- 'kms:ReEncryptFrom'

- 'kms:DisconnectCustomKeyStore'

- 'kms:DescribeKey'

- 'kms:GenerateDataKeyWithoutPlaintext'

- 'kms:DescribeCustomKeyStores'

- 'kms:CreateCustomKeyStore'

- 'kms:EnableKey'

- 'kms:RevokeGrant'

- 'kms:UpdateKeyDescription'

- 'kms:ConnectCustomKeyStore'

- 'kms:CancelKeyDeletion'

- 'kms:GenerateDataKey'

Resource:

- !Join

- ''

- - 'arn:aws:kms:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':key/*'

- Sid: Allow access for Key Administrators

Effect: Allow

Principal:

AWS:

- !GetAtt SageMakerExecutionRole.Arn

Action:

- 'kms:CreateAlias'

- 'kms:CreateKey'

- 'kms:CreateGrant'

- 'kms:CreateCustomKeyStore'

- 'kms:DescribeKey'

- 'kms:DescribeCustomKeyStores'

- 'kms:EnableKey'

- 'kms:EnableKeyRotation'

- 'kms:ListKeys'

- 'kms:ListAliases'

- 'kms:ListKeyPolicies'

- 'kms:ListGrants'

- 'kms:ListRetirableGrants'

- 'kms:ListResourceTags'

- 'kms:PutKeyPolicy'

- 'kms:UpdateAlias'

- 'kms:UpdateKeyDescription'

- 'kms:UpdateCustomKeyStore'

- 'kms:RevokeGrant'

- 'kms:DisableKey'

- 'kms:DisableKeyRotation'

- 'kms:GetPublicKey'

- 'kms:GetKeyRotationStatus'

- 'kms:GetKeyPolicy'

- 'kms:GetParametersForImport'

- 'kms:DeleteCustomKeyStore'

- 'kms:DeleteImportedKeyMaterial'

- 'kms:DeleteAlias'

- 'kms:TagResource'

- 'kms:UntagResource'

- 'kms:ScheduleKeyDeletion'

- 'kms:CancelKeyDeletion'

Resource:

- !Join

- ''

- - 'arn:aws:kms:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':key/*'

- Sid: Allow use of the key

Effect: Allow

Principal:

AWS:

- !GetAtt SageMakerExecutionRole.Arn

Action:

- kms:Encrypt

- kms:Decrypt

- kms:ReEncryptTo

- kms:ReEncryptFrom

- kms:GenerateDataKeyPair

- kms:GenerateDataKeyPairWithoutPlaintext

- kms:GenerateDataKeyWithoutPlaintext

- kms:GenerateDataKey

- kms:DescribeKey

Resource:

- !Join

- ''

- - 'arn:aws:kms:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':key/*'

- Sid: Allow attachment of persistent resources

Effect: Allow

Principal:

AWS:

- !GetAtt SageMakerExecutionRole.Arn

Action:

- kms:CreateGrant

- kms:ListGrants

- kms:RevokeGrant

Resource:

- !Join

- ''

- - 'arn:aws:kms:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':key/*'

Condition:

Bool:

kms:GrantIsForAWSResource: 'true'

KeyAlias:

Type: AWS::KMS::Alias

Properties:

AliasName: 'alias/SageMaker-CMK-DS'

TargetKeyId:

Ref: SagemakerKMSKey

SageMakerExecutionRole:

Type: 'AWS::IAM::Role'

Properties:

Tags:

- Key: ProjectID

Value: !Ref ProjectID

- Key: ProjectName

Value: !Ref ProjectName

AssumeRolePolicyDocument:

Statement:

- Effect: Allow

Principal:

Service:

- sagemaker.amazonaws.com

Action:

- 'sts:AssumeRole'

Path: /

Policies:

- PolicyName: !Join

- ''

- - !Ref ProjectName

- SageMakerExecutionPolicy

PolicyDocument:

Version: 2012-10-17

Statement:

- Effect: Allow

Action:

- 'iam:ListRoles'

Resource:

- !Join

- ''

- - 'arn:aws:iam::'

- !Ref 'AWS::AccountId'

- ':role/*'

- Sid: CloudArnResource

Effect: Allow

Action:

- 'application-autoscaling:DeleteScalingPolicy'

- 'application-autoscaling:DeleteScheduledAction'

- 'application-autoscaling:DeregisterScalableTarget'

- 'application-autoscaling:DescribeScalableTargets'

- 'application-autoscaling:DescribeScalingActivities'

- 'application-autoscaling:DescribeScalingPolicies'

- 'application-autoscaling:DescribeScheduledActions'

- 'application-autoscaling:PutScalingPolicy'

- 'application-autoscaling:PutScheduledAction'

- 'application-autoscaling:RegisterScalableTarget'

Resource:

- !Join

- ''

- - 'arn:aws:autoscaling:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':*'

- Sid: ElasticArnResource

Effect: Allow

Action:

- 'elastic-inference:Connect'

Resource:

- !Join

- ''

- - 'arn:aws:elastic-inference:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':elastic-inference-accelerator/*'

- Sid: SNSArnResource

Effect: Allow

Action:

- 'sns:ListTopics'

Resource:

- !Join

- ''

- - 'arn:aws:sns:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':*'

- Sid: logsArnResource

Effect: Allow

Action:

- 'cloudwatch:DeleteAlarms'

- 'cloudwatch:DescribeAlarms'

- 'cloudwatch:GetMetricData'

- 'cloudwatch:GetMetricStatistics'

- 'cloudwatch:ListMetrics'

- 'cloudwatch:PutMetricAlarm'

- 'cloudwatch:PutMetricData'

- 'logs:CreateLogGroup'

- 'logs:CreateLogStream'

- 'logs:DescribeLogStreams'

- 'logs:GetLogEvents'

- 'logs:PutLogEvents'

Resource:

- !Join

- ''

- - 'arn:aws:logs:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':log-group:/aws/lambda/*'

- Sid: KmsArnResource

Effect: Allow

Action:

- 'kms:DescribeKey'

- 'kms:ListAliases'

Resource:

- !Join

- ''

- - 'arn:aws:kms:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':key/*'

- Sid: ECRArnResource

Effect: Allow

Action:

- 'ecr:BatchCheckLayerAvailability'

- 'ecr:BatchGetImage'

- 'ecr:CreateRepository'

- 'ecr:GetAuthorizationToken'

- 'ecr:GetDownloadUrlForLayer'

- 'ecr:DescribeRepositories'

- 'ecr:DescribeImageScanFindings'

- 'ecr:DescribeRegistry'

- 'ecr:DescribeImages'

Resource:

- !Join

- ''

- - 'arn:aws:ecr:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':repository/*'

- Sid: EC2ArnResource

Effect: Allow

Action:

- 'ec2:CreateNetworkInterface'

- 'ec2:CreateNetworkInterfacePermission'

- 'ec2:DeleteNetworkInterface'

- 'ec2:DeleteNetworkInterfacePermission'

- 'ec2:DescribeDhcpOptions'

- 'ec2:DescribeNetworkInterfaces'

- 'ec2:DescribeRouteTables'

- 'ec2:DescribeSecurityGroups'

- 'ec2:DescribeSubnets'

- 'ec2:DescribeVpcEndpoints'

- 'ec2:DescribeVpcs'

Resource:

- !Join

- ''

- - 'arn:aws:ec2:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':instance/*'

- Sid: S3ArnResource

Effect: Allow

Action:

- 's3:CreateBucket'

- 's3:GetBucketLocation'

- 's3:ListBucket'

Resource:

- !Join

- ''

- - 'arn:aws:s3::'

- ':*sagemaker*'

- Sid: LambdaInvokePermission

Effect: Allow

Action:

- 'lambda:ListFunctions'

Resource:

- !Join

- ''

- - 'arn:aws:lambda:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':function'

- ':*'

- Effect: Allow

Action: 'sagemaker:InvokeEndpoint'

Resource:

- !Join

- ''

- - 'arn:aws:sagemaker:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':notebook-instance-lifecycle-config/*'

Condition:

StringEquals:

'aws:PrincipalTag/ProjectID': !Ref ProjectID

- Effect: Allow

Action:

- 'sagemaker:CreateTrainingJob'

- 'sagemaker:CreateEndpoint'

- 'sagemaker:CreateModel'

- 'sagemaker:CreateEndpointConfig'

- 'sagemaker:CreateHyperParameterTuningJob'

- 'sagemaker:CreateTransformJob'

Resource:

- !Join

- ''

- - 'arn:aws:sagemaker:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':notebook-instance-lifecycle-config/*'

Condition:

StringEquals:

'aws:PrincipalTag/ProjectID': !Ref ProjectID

'ForAllValues:StringEquals':

'aws:TagKeys':

- Username

- Effect: Allow

Action:

- 'sagemaker:DescribeTrainingJob'

- 'sagemaker:DescribeEndpoint'

- 'sagemaker:DescribeEndpointConfig'

Resource:

- !Join

- ''

- - 'arn:aws:sagemaker:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':notebook-instance-lifecycle-config/*'

Condition:

StringEquals:

'aws:PrincipalTag/ProjectID': !Ref ProjectID

- Effect: Allow

Action:

- 'sagemaker:DeleteTags'

- 'sagemaker:ListTags'

- 'sagemaker:DescribeNotebookInstance'

- 'sagemaker:ListNotebookInstanceLifecycleConfigs'

- 'sagemaker:DescribeModel'

- 'sagemaker:ListTrainingJobs'

- 'sagemaker:DescribeHyperParameterTuningJob'

- 'sagemaker:UpdateEndpointWeightsAndCapacities'

- 'sagemaker:ListHyperParameterTuningJobs'

- 'sagemaker:ListEndpointConfigs'

- 'sagemaker:DescribeNotebookInstanceLifecycleConfig'

- 'sagemaker:ListTrainingJobsForHyperParameterTuningJob'

- 'sagemaker:StopHyperParameterTuningJob'

- 'sagemaker:DescribeEndpointConfig'

- 'sagemaker:ListModels'

- 'sagemaker:AddTags'

- 'sagemaker:ListNotebookInstances'

- 'sagemaker:StopTrainingJob'

- 'sagemaker:ListEndpoints'

- 'sagemaker:DeleteEndpoint'

Resource:

- !Join

- ''

- - 'arn:aws:sagemaker:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':notebook-instance-lifecycle-config/*'

Condition:

StringEquals:

'aws:PrincipalTag/ProjectID': !Ref ProjectID

- Effect: Allow

Action:

- 'ecr:SetRepositoryPolicy'

- 'ecr:CompleteLayerUpload'

- 'ecr:BatchDeleteImage'

- 'ecr:UploadLayerPart'

- 'ecr:DeleteRepositoryPolicy'

- 'ecr:InitiateLayerUpload'

- 'ecr:DeleteRepository'

- 'ecr:PutImage'

Resource:

- !Join

- ''

- - 'arn:aws:ecr:'

- !Ref 'AWS::Region'

- ':'

- !Ref 'AWS::AccountId'

- ':repository/*sagemaker*'

- Effect: Allow

Action:

- 's3:GetObject'

- 's3:ListBucket'

- 's3:PutObject'

- 's3:DeleteObject'

Resource:

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref SagemakerS3Bucket

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref SagemakerS3Bucket

- /*

Condition:

StringEquals:

'aws:PrincipalTag/ProjectID': !Ref ProjectID

- Effect: Allow

Action: 'iam:PassRole'

Resource:

- !Join

- ''

- - 'arn:aws:iam::'

- !Ref 'AWS::AccountId'

- ':role/*'

Condition:

StringEquals:

'iam:PassedToService': sagemaker.amazonaws.com

CodeBucketPolicy:

Type: 'AWS::IAM::Policy'

Condition: BucketCondition

Properties:

PolicyName: !Join

- ''

- - !Ref ProjectName

- CodeBucketPolicy

PolicyDocument:

Version: 2012-10-17

Statement:

- Effect: Allow

Action:

- 's3:GetObject'

Resource:

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref CodeBucketName

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref CodeBucketName

- '/*'

Roles:

- !Ref SageMakerExecutionRole

SagemakerS3Bucket:

Type: 'AWS::S3::Bucket'

Properties:

BucketEncryption:

ServerSideEncryptionConfiguration:

- ServerSideEncryptionByDefault:

SSEAlgorithm: AES256

Tags:

- Key: ProjectID

Value: !Ref ProjectID

- Key: ProjectName

Value: !Ref ProjectName

S3Policy:

Type: 'AWS::S3::BucketPolicy'

Properties:

Bucket: !Ref SagemakerS3Bucket

PolicyDocument:

Version: 2012-10-17

Statement:

- Sid: AllowAccessFromVPCEndpoint

Effect: Allow

Principal: "*"

Action:

- 's3:Get*'

- 's3:Put*'

- 's3:List*'

- 's3:DeleteObject'

Resource:

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref SagemakerS3Bucket

- !Join

- ''

- - 'arn:aws:s3:::'

- !Ref SagemakerS3Bucket

- '/*'

Condition:

StringEquals:

"aws:sourceVpce": "<PASTE S3 VPC ENDPOINT ID>"

EFSLifecycleConfig:

Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'

Properties:

NotebookInstanceLifecycleConfigName: 'Provisioned-LC'

OnCreate:

- Content: !Base64

'Fn::Join':

- ''

- - |

#!/bin/bash

- |

aws configure set sts_regional_endpoints regional

- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config

OnStart:

- Content: !Base64

'Fn::Join':

- ''

- - |

#!/bin/bash

- |

aws configure set sts_regional_endpoints regional

- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config

EFSLifecycleConfigForS3:

Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'

Properties:

NotebookInstanceLifecycleConfigName: 'Provisioned-LC-S3'

OnCreate:

- Content: !Base64

'Fn::Join':

- ''

- - |

#!/bin/bash

- |

# Copy Content

- !Sub >

aws s3 cp s3://${CodeBucketName} /home/ec2-user/SageMaker/ --recursive

- |

# Set sts endpoint

- >

aws configure set sts_regional_endpoints regional

- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config

OnStart:

- Content: !Base64

'Fn::Join':

- ''

- - |

#!/bin/bash

- |

aws configure set sts_regional_endpoints regional

- yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config

SageMakerCustomResource:

Type: 'Custom::SageMakerCustomResource'

DependsOn: S3Policy

Properties:

ServiceToken: !Ref SageMakerBuildFunctionARN

NotebookInstanceName: !Ref NotebookInstanceName

NotebookInstanceType: !Ref NotebookInstanceType

KmsKeyId: !Ref SagemakerKMSKey

ENVName: !Join

- ''

- - !Ref ENVName

- !Sub Subnet1Id

Subnet: !Ref SubnetName

SecurityGroupName: !Ref SecurityGroupName

ProjectName: !Ref ProjectName

RootAccess: !Ref RootAccess

VolumeSizeInGB: !Ref VolumeSizeInGB

LifecycleConfigName: !If [BucketCondition, !GetAtt EFSLifecycleConfigForS3.NotebookInstanceLifecycleConfigName, !GetAtt EFSLifecycleConfig.NotebookInstanceLifecycleConfigName]

DirectInternetAccess: !Ref DirectInternetAccess

RoleArn: !GetAtt

- SageMakerExecutionRole

- Arn

Tags:

- Key: ProjectID

Value: !Ref ProjectID

- Key: ProjectName

Value: !Ref ProjectName

Outputs:

Message:

Description: Execution Status

Value: !GetAtt

- SageMakerCustomResource

- Message

SagemakerKMSKey:

Description: KMS Key for encrypting Sagemaker resource

Value: !Ref KeyAlias

ExecutionRoleArn:

Description: ARN of the Sagemaker Execution Role

Value: !Ref SageMakerExecutionRole

S3BucketName:

Description: S3 bucket for SageMaker Notebook operation

Value: !Ref SagemakerS3Bucket

NotebookInstanceName:

Description: Name of the Sagemaker Notebook instance created

Value: !Ref NotebookInstanceName

ProjectName:

Description: Project ID used for SageMaker deployment

Value: !Ref ProjectName

ProjectID:

Description: Project ID used for SageMaker deployment

Value: !Ref ProjectID

3. 接下来我们进入VPC服务主页,进入Endpoint功能,点击Create endpoint创建一个VPC endpoint节点,用于SageMaker私密安全的访问S3桶中的大模型文件。

4. 为节点命名为“s3-endpoint”,并选择节点访问对象类型为AWS service,选择s3作为访问服务。

5. 选择节点所在的VPC,并配置路由表,最后点击创建。

6. 接下来我们进入亚马逊云科技service catalog服务主页,进入Portfolio功能,点击create创建一个新的portfolio,用于统一管理一整个包括不同云资源的服务。

7. 为service portfolio起名“SageMakerPortfolio“,所有者选为CQ。

8. 接下来我们为Portfolio添加云资源,点击"create product"

9. 我们选择通过CloudFormation IaC脚本的形式创建Product云资源,为Product其名为”SageMakerProduct“,所有者设置为CQ。

10. 在Product中添加CloudFormation脚本文件,我们通过URL的形式,将我们在第二步上传到S3中的CloudFormation脚本URL填入,并设置版本为1,最后点击Create创建Product云资源。

11.接下来我们进入到Constraints页面,点击create创建Constraints,用于通过权限管理限制利用Service Catalog Product对云资源的操作。

12. 选择限制我们刚刚创建的的Product: "SageMakerProduct",选择限制的类型为创建。

13. 为限制添加IAM角色规则,IAM角色中配置了对Product权限管理规则,再点击Create创建。

14. 接下来我们点击Access,创建一个Access来限制可以访问Product云资源的用户。

15. 我们添加了角色”SCEndUserRole“,用户代替用户访问Product创建云资源。

16. 接下来我们开始利用Service Catalog Product创建一些列的云资源。选中我们刚创建的Product,点击Launch

17. 为我们要创建的云资源Product起一个名字”DataScientistProduct“, 选择我们前一步创建的版本号1。

18. 为将要通过Product创建的SageMaker配置参数,环境名以及实例名

19. 添加我们在最开始创建的Lambda函数ARN ID,点击Launch开始创建。

20. 最后回到SageMaker服务主页,可以看到我们利用Service Catalog Product功能成功创建了一个新的Jupyter Notebook实例。利用这个实例,我们就可以开发我们的AI服务应用。

以上就是在亚马逊云科技上利用亚马逊云科技安全、合规地训练AI大模型和开发AI应用全部步骤。欢迎大家未来与我一起,未来获取更多国际前沿的生成式AI开发方案。



声明

本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。