Newer
Older
This project contains the model to predict multi label division for a given text.
## Introduction
The structure of the project is based on https://github.com/Kenza-AI/sagify.
This library is a CLI used to train and deploy ML/DL models on AWS SageMaker.
The goal is to have a local configuration that ensures a successful training (and deployment) of models with AWS SageMaker.
Basically, if it works locally, it will work in AWS SageMaker.
* Python 3.8, python3.8 venv -> https://www.linuxcapable.com/install-python-3-8-on-ubuntu-linux/
* Docker
* awscli (optional): this allows starting interacting with AWS SageMaker from a laptop. In a CI/CD pipeline like in TED AI, the GitLab runner in the environment will take core of that.
## Project structure
This project uses Sagify https://github.com/Kenza-AI/sagify as a base for the structure of the project.
Get familiar with the documentation of this tool as it won't differ much from here.
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Some changes were made to be able to create a CI pipeline in a SRV4DEV environment.
These changes can be found in the following sections.
### metrics.json
This file defines the regular expressions used to extract the metrics from the logs.
These metrics are then available in the Gitlab CI pipeline artifacts.
### .sagify.json
This file defines the properties of the project which are used by Sagify and the CI to configure, build, name the results.
Most of these properties are used to build the Docker image: src/sagify_base/build.sh & src/sagify_base/Dockerfile
* image_name: the name of the Docker image
* aws_profile: the AWS profile to use when build locally
* aws_region: the AWS region of the project, defines where the ECR can be found
* python_version: the python version of the Docker file
* requirements_dir: the path to the requirements.txt file
* sagify_module_dir: the path to root directory of the sagify code
Used by the CI
* experiment_id: the experiment ID where the training data is located in s3://d-ew1-ted-ai-ml-data/experiments/
* docker_image_base_url: the docker image URL of the SageMaker container -> the one that will be used by the CI and SageMaker for the real training
### Dockerfile(s)
There are 2 Docker files in this project:
* src/sagify_base/Dockerfile: Dockerfile used to build and train in the local environment -> ensure correct setup of the project
* src/sagify_base/Dockerfile-sagemaker: Dockerfile used to build and train in SageMaker. This Docker image will be pushed to Amazon ECR and used for training/inference.
### call_endpoint.sh
The script located at src/sagify_base/local_test/call_endpoint.sh expect the local inference container to be running
and test the invocation.
A successful results indicates a successful loading of the model, preprocessing of the data and prediction from the model.
### Code
Please refer to Sagify documentation for more information.
* Put your training code in the function train() in src/sagify_base/training/training.py
* Put the loading of the model and inference code in src/sagify_base/prediction/prediction.py
## CI/CD pipelines
The CI is composed of 2 steps when started in a feature branch and 5 in a tagged version.
The final result of the CI pipeline of a tagged commit is:
* a Docker image for inference in Amazon ECR sagemaker-classifiers repository
* the model artifact in s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
* the experiment data of the training in s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag
These artifacts must then be deployed using terraform to create SageMaker endpoint.
Executes the script ci/build_push_image.sh to build the Docker image for SageMaker training and push it to Amazon ECR repository ci-temporary-images.
To facilitate the cleanup, 2 distinct ECR repositories were created:
* ci-temporary-images: contains temporary Docker images for CI needs (feature branches) - automatically removed after some time
* sagemaker-classifiers: contains the real images that will be used for inference - not removed
#### train
Executes ci/train.sh; this uses the Docker image previously built for training in SageMaker.
This job generates artifacts in Gitlab CI with metrics and other information.
Once the result is satisfying, create a merge request.
### Only tagged pipeline
The following steps are only executed on tagged commit.
#### save-model
Executes ci/save_model.sh and copy the model artifact from temporary bucket to s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
These models must not be deleted, they will be retrieved by SageMaker endpoint and combined with the Docker image for inference when deploying an endpoint.
#### save-experiment-data
Executes ci/save_experiment.sh and copy the experiment data from "s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id" to "s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
This ensures that to keep a history of the data that was used for building and training the classifier.
The configuration and path is extracted from .sagify.json with the property *experiment_id*.
#### push-image-to-ecr
Executes ci/build_push_image.sh and push the Docker image in sagemaker-classifiers for training.
```shell
$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
```
## Test local environment setup
The goal is to be able to build a Docker image, simulate a SageMaker training and create an local inference endpoint.
This maximizes a successful SageMaker configuration for the building of a model.
### Build
The build command creates a Docker image that can be used by SageMaker for training&inference.
Expect result:

### Local train
The command sagify build must be executed successfully before starting a local training.
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
```shell
$ sagify local train
```
Expect result:

### Local deploy
After a successful training, the inference container can be started with:
```shell
$ sagify local deploy
```
Expected result:

To check the logs, find the container name and tail the logs
```shell
# find container name of your image
$ docker ps
# tail the logs using the container name found in NAMES column (eg: random_name)
$ docker tail -f random_name
```
### All in one
During the development of the project, all the commands can be chained
```shell
sagify build && sagify local train && sagify local deploy
```