Code development platform for open source projects from the European Union institutions :large_blue_circle: EU Login authentication by SMS has been phased out. To see alternatives please check here

Skip to content
Snippets Groups Projects
Commit 63f1747b authored by Gilles Habran's avatar Gilles Habran
Browse files

Merge branch 'AITED-118_ci-pipeline' into 'main'

AITED-118: add ci pipeline

See merge request !1
parents 25aadc12 921052af
No related branches found
No related tags found
1 merge request!1AITED-118: add ci pipeline
Pipeline #48558 canceled
Showing with 519 additions and 58 deletions
# Jupyter notebooks
.ipynb_checkpoints/
.git/
*.pyc
*.whl
build/
project.egg-info/
__pycache__/
venv/
# models
model/*
failure/*
failure
model.joblib
# vscode stuff
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
*.code-workspace
# pycharm stuff
.idea
# Local History for Visual Studio Code
.history/
image: ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/python:3.8
stages:
- build
- train
- push
build-image:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.9.0-debug
entrypoint: [ "" ]
script: sh -x ci/build_push_image.sh "528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers-ci:$(/busybox/sh ci/get_config_value.sh image_name)-$CI_PIPELINE_ID"
train:
stage: train
script: sh -x ci/train.sh
artifacts:
paths:
- job_result.json
expire_in: 30 days
save-experiment-data:
stage: push
script: sh -x ci/save_experiment.sh "$CI_COMMIT_REF_NAME"
only:
- tags
push-image-to-ecr:
stage: push
image:
name: gcr.io/kaniko-project/executor:v1.9.0-debug
entrypoint: [ "" ]
script: sh -x ci/build_push_image.sh "528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers:$(/busybox/sh ci/get_config_value.sh image_name)-$CI_COMMIT_REF_NAME"
only:
- tags
save-model:
stage: push
script: sh -x ci/save_model.sh "$CI_COMMIT_REF_NAME"
only:
- tags
{
"image_name": "multi-label-division-classifier",
"aws_profile": "default",
"aws_region": "eu-west-1",
"python_version": "3.8",
"requirements_dir": "requirements.txt",
"sagify_module_dir": "src",
"experiment_id": "cpv_v0.0.1",
"docker_image_base_url": "python/3.8"
}
# multi-label-division-classifier # multi-label-division-classifier
This project contains the model to predict multi label division for a given text.
## Introduction
The structure of the project is based on https://github.com/Kenza-AI/sagify.
This library is a CLI used to train and deploy ML/DL models on AWS SageMaker.
## Getting started The goal is to have a local configuration that ensures a successful training (and deployment) of models with AWS SageMaker.
Basically, if it works locally, it will work in AWS SageMaker.
To make it easy for you to get started with GitLab, here's a list of recommended next steps. ## Requirements
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)! * Python 3.8, python3.8 venv -> https://www.linuxcapable.com/install-python-3-8-on-ubuntu-linux/
* Docker
* awscli (optional): this allows starting interacting with AWS SageMaker from a laptop. In a CI/CD pipeline like in TED AI, the GitLab runner in the environment will take core of that.
## Add your files ## Project structure
This project uses Sagify https://github.com/Kenza-AI/sagify as a base for the structure of the project.
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files Get familiar with the documentation of this tool as it won't differ much from here.
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
``` Some changes were made to be able to create a CI pipeline in a SRV4DEV environment.
cd existing_repo These changes can be found in the following sections.
git remote add origin https://code.europa.eu/ted-ai/models/multi-label-division-classifier.git
git branch -M main ### metrics.json
git push -uf origin main This file defines the regular expressions used to extract the metrics from the logs.
```
These metrics are then available in the Gitlab CI pipeline artifacts.
### .sagify.json
This file defines the properties of the project which are used by Sagify and the CI to configure, build, name the results.
Most of these properties are used to build the Docker image: src/sagify_base/build.sh & src/sagify_base/Dockerfile
* image_name: the name of the Docker image
* aws_profile: the AWS profile to use when build locally
* aws_region: the AWS region of the project, defines where the ECR can be found
* python_version: the python version of the Docker file
* requirements_dir: the path to the requirements.txt file
* sagify_module_dir: the path to root directory of the sagify code
Used by the CI
* experiment_id: the experiment ID where the training data is located in s3://d-ew1-ted-ai-ml-data/experiments/
* docker_image_base_url: the docker image URL of the SageMaker container -> the one that will be used by the CI and SageMaker for the real training
### Dockerfile(s)
There are 2 Docker files in this project:
* src/sagify_base/Dockerfile: Dockerfile used to build and train in the local environment -> ensure correct setup of the project
* src/sagify_base/Dockerfile-sagemaker: Dockerfile used to build and train in SageMaker. This Docker image will be pushed to Amazon ECR and used for training/inference.
### call_endpoint.sh
The script located at src/sagify_base/local_test/call_endpoint.sh expect the local inference container to be running
and test the invocation.
A successful results indicates a successful loading of the model, preprocessing of the data and prediction from the model.
### Code
Please refer to Sagify documentation for more information.
* Put your training code in the function train() in src/sagify_base/training/training.py
* Put the loading of the model and inference code in src/sagify_base/prediction/prediction.py
## Integrate with your tools ## CI/CD pipelines
The CI is composed of 2 steps when started in a feature branch and 5 in a tagged version.
- [ ] [Set up project integrations](https://code.europa.eu/ted-ai/models/multi-label-division-classifier/-/settings/integrations) The final result of the CI pipeline of a tagged commit is:
* a Docker image for inference in Amazon ECR sagemaker-classifiers repository
* the model artifact in s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
* the experiment data of the training in s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag
## Collaborate with your team These artifacts must then be deployed using terraform to create SageMaker endpoint.
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/) ### All pipelines
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html) #### build-image
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically) Executes the script ci/build_push_image.sh to build the Docker image for SageMaker training and push it to Amazon ECR repository sagemaker-classifiers-ci.
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
## Test and Deploy To facilitate the cleanup, 2 distinct ECR repositories were created:
Use the built-in continuous integration in GitLab. * sagemaker-classifiers-ci: contains temporary Docker images for CI needs (feature branches) - automatically removed after some time
* sagemaker-classifiers: contains the real images that will be used for inference - not removed
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html) #### train
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/) Executes ci/train.sh; this uses the Docker image previously built for training in SageMaker.
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
*** This job generates artifacts in Gitlab CI with metrics and other information.
# Editing this README Once the result is satisfying, create a merge request.
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template. ### Only tagged pipeline
The following steps are only executed on tagged commit.
## Suggestions for a good README #### save-model
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information. Executes ci/save_model.sh and copy the model artifact from temporary bucket to s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
## Name These models must not be deleted, they will be retrieved by SageMaker endpoint and combined with the Docker image for inference when deploying an endpoint.
Choose a self-explaining name for your project.
## Description #### save-experiment-data
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors. Executes ci/save_experiment.sh and copy the experiment data from "s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id" to "s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
## Badges This ensures that to keep a history of the data that was used for building and training the classifier.
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
## Visuals The configuration and path is extracted from .sagify.json with the property *experiment_id*.
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Installation #### push-image-to-ecr
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection. Executes ci/build_push_image.sh and push the Docker image in sagemaker-classifiers for training.
## Usage These images are not automatically deleted.
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Support ## Setup local environment
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
## Roadmap ```shell
If you have ideas for releases in the future, it is a good idea to list them in the README. $ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
```
## Test local environment setup
The goal is to be able to build a Docker image, simulate a SageMaker training and create an local inference endpoint.
This maximizes a successful SageMaker configuration for the building of a model.
## Contributing ### Build
State if you are open to contributions and what your requirements are for accepting them. The build command creates a Docker image that can be used by SageMaker for training&inference.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self. ```shell
$ sagify build
```
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser. Expect result:
![Sagify build - result](docs/sagify_build.png)
## Authors and acknowledgment ### Local train
Show your appreciation to those who have contributed to the project. The command sagify build must be executed successfully before starting a local training.
## License ```shell
For open source projects, say how it is licensed. $ sagify local train
```
Expect result:
![Sagify local train - result](docs/sagify_local_train.png)
### Local deploy
After a successful training, the inference container can be started with:
```shell
$ sagify local deploy
```
Expected result:
![Sagify local deploy - result](docs/sagify_local_deploy.png)
To check the logs, find the container name and tail the logs
```shell
# find container name of your image
$ docker ps
# tail the logs using the container name found in NAMES column (eg: random_name)
$ docker tail -f random_name
```
### All in one
During the development of the project, all the commands can be chained
```shell
sagify build && sagify local train && sagify local deploy
```
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
#!/usr/bin/env bash
image_url=$1
requirements_dir=$(/busybox/sh ci/get_config_value.sh requirements_dir)
sagify_module_dir=$(/busybox/sh ci/get_config_value.sh sagify_module_dir)
echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
/kaniko/executor \
--single-snapshot \
--context "${CI_PROJECT_DIR}" \
--dockerfile "$sagify_module_dir/sagify_base/Dockerfile-sagemaker" \
--destination "$image_url" \
--build-arg module_path="$sagify_module_dir" \
--build-arg target_dir_name="$sagify_module_dir" \
--build-arg requirements_file_path="$requirements_dir" \
--build-arg docker_image_base_url="${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/python:3.8" \
#!/usr/bin/env bash
set -euo pipefail +x
field_name=$1
grep -Eo "\"$field_name\" *: *\"(.*)\"" .sagify.json | grep -Eo ':.*".*"' | cut -d'"' -f 2
#!/usr/bin/env bash
python3 -m pip install awscli
git_tag=$1
if [ -z "$git_tag" ]; then
echo "Error when trying to retrieve the git tag to save the experiment. Cannot continue."
exit 1
fi
image_name=$(bash ci/get_config_value.sh image_name)
if [ -z "$image_name" ]; then
echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
exit 1
fi
experiment_id=$(bash ci/get_config_value.sh experiment_id)
if [ -z "$experiment_id" ]; then
echo "The config 'experiment_id' in .sagify.json must be set. Cannot continue."
exit 1
fi
s3_training_data_prefix="s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id"
s3_data_artifacts_prefix="s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
echo "Copying data from '$s3_training_data_prefix/*' to '$s3_data_artifacts_prefix/'"
aws s3 sync "$s3_training_data_prefix" "$s3_data_artifacts_prefix"
#!/usr/bin/env bash
python3 -m pip install awscli
apt update && apt install jq --no-install-recommends -y
git_tag=$1
if [ -z "$git_tag" ]; then
echo "Error when trying to retrieve the git tag to save the experiment. Cannot continue."
exit 1
fi
image_name=$(bash ci/get_config_value.sh image_name)
if [ -z "$image_name" ]; then
echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
exit 1
fi
s3_model_source_path=$(jq .ModelArtifacts.S3ModelArtifacts job_result.json)
if [ -z "$s3_model_source_path" ]; then
echo "The value of 's3_model_source_path' could not be extracted from job_result.json file. Cannot continue."
exit 1
fi
s3_model_source_path_cleaned=$(echo "$s3_model_source_path" | tr -d '"')
s3_model_destination_path="s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz"
echo "Copying data from '$s3_model_source_path_cleaned' to '$s3_model_destination_path'"
aws s3 cp "$s3_model_source_path_cleaned" "$s3_model_destination_path"
#!/usr/bin/env bash
python3 -m pip install awscli
image_name=$(bash ci/get_config_value.sh image_name)
if [ -z "$image_name" ]; then
echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
exit 1
fi
experiment_id=$(bash ci/get_config_value.sh experiment_id)
if [ -z "$experiment_id" ]; then
echo "The config 'experiment_id' in .sagify.json must be set. Cannot continue."
exit 1
fi
s3_training_data_prefix="s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id"
datetime=$(date '+%Y-%m-%d-%H-%M-%S')
image_url="528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers-ci:$image_name-$CI_PIPELINE_ID"
s3_output_prefix=s3://d-ew1-ted-ai-ml-models/models-ci
metrics="$(cat metrics.json)"
# aws s3 cp "$s3_training_data_prefix/hyperparameters.json" /tmp/hyperparameters.json
hyperparams="$(cat src/sagify_base/local_test/test_dir/input/config/hyperparameters.json)"
job_arn=$(
aws sagemaker create-training-job \
--training-job-name "$image_name-$datetime" \
--hyper-parameters "$hyperparams" \
--algorithm-specification '{"TrainingImage":"'"$image_url"'","TrainingInputMode":"File","MetricDefinitions":'"$metrics"'}' \
--role-arn "arn:aws:iam::528719223857:role/sagemaker_notebooks" \
--input-data-config '{"ChannelName":"training","DataSource":{"S3DataSource":{"S3DataType":"S3Prefix","S3Uri":"'"$s3_training_data_prefix"'","S3DataDistributionType":"FullyReplicated"}}}' \
--output-data-config "S3OutputPath=$s3_output_prefix" \
--resource-config "InstanceType=ml.m5.large,InstanceCount=1,VolumeSizeInGB=30" \
--stopping-condition "MaxRuntimeInSeconds=86400" \
--query TrainingJobArn \
--region eu-west-1 \
--output text
)
job_name=$(echo "$job_arn" | cut -d/ -f 2)
aws sagemaker wait training-job-completed-or-stopped --region eu-west-1 --training-job-name "$job_name"
aws sagemaker describe-training-job --region eu-west-1 --training-job-name "$job_name" > job_result.json
docs/sagify_build.png

9.41 KiB

docs/sagify_local_deploy.png

5.76 KiB

docs/sagify_local_train.png

7.23 KiB

[
{
"Name": "f1",
"Regex": "f1=(.*);"
},
{
"Name": "roc_auc",
"Regex": "roc_auc=(.*);"
},
{
"Name": "accuracy",
"Regex": "accuracy=(.*);"
},
{
"Name": "coverage_err",
"Regex": "coverage_err=(.*);"
},
{
"Name": "label_ranking_average_precision",
"Regex": "label_ranking_average_precision=(.*);"
}
]
attrs==22.2.0
blis==0.7.9
boto3==1.26.108
botocore==1.29.108
catalogue==2.0.8
certifi==2022.12.7
charset-normalizer==3.1.0
click==8.0.4
confection==0.0.4
cymem==2.0.7
dill==0.3.6
docker==5.0.3
future==0.18.3
google-pasta==0.2.0
idna==3.4
importlib-metadata==6.1.0
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
langcodes==3.3.0
MarkupSafe==2.1.2
multiprocess==0.70.14
murmurhash==1.0.9
numpy==1.24.2
packaging==23.0
pandas==2.0.0
pathos==0.3.0
pathy==0.10.1
pox==0.3.2
ppft==1.7.6.6
preshed==3.0.8
protobuf==3.20.3
protobuf3-to-dict==0.1.5
pydantic==1.10.7
python-dateutil==2.8.2
pytz==2023.3
requests==2.28.2
s3transfer==0.6.0
sagemaker==2.72.3
sagify==0.23.0
scikit-learn==1.2.2
scipy==1.10.1
six==1.16.0
smart-open==6.3.0
smdebug-rulesconfig==1.0.1
spacy==3.5.1
spacy-legacy==3.0.12
spacy-loggers==1.0.4
srsly==2.4.6
thinc==8.1.9
threadpoolctl==3.1.0
tqdm==4.65.0
typer==0.7.0
typing_extensions==4.5.0
tzdata==2023.3
Unidecode==1.3.6
urllib3==1.26.15
wasabi==1.1.1
websocket-client==1.5.1
zipp==3.15.0
ARG python_version
FROM python:$python_version-slim-buster
LABEL maintainer="Kenza AI <support@kenza.ai>"
RUN apt-get -y update && apt-get install -y --no-install-recommends \
make \
nginx \
ca-certificates \
g++ \
git \
&& rm -rf /var/lib/apt/lists/*
# PYTHONUNBUFFERED keeps Python from buffering the standard
# output stream, which means that logs can be delivered to the user quickly.
# PYTHONDONTWRITEBYTECODE keeps Python from writing the .pyc files which are unnecessary in this case.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ARG requirements_file_path
ARG module_path
ARG target_dir_name
COPY ${requirements_file_path} /opt/program/sagify-requirements.txt
WORKDIR /opt/program/${target_dir_name}
# Here we get all python packages.
RUN pip install flask gevent gunicorn future
RUN pip install -r ../sagify-requirements.txt && rm -rf /root/.cache
RUN python3 -m spacy download en_core_web_sm
RUN apt-get -y purge --auto-remove git
COPY ${module_path} /opt/program/${target_dir_name}
ENTRYPOINT ["sagify_base/executor.sh"]
ARG docker_image_base_url
FROM $docker_image_base_url
LABEL maintainer="Kenza AI <support@kenza.ai>"
RUN apt-get -y update && apt-get install -y --no-install-recommends \
make \
nginx \
ca-certificates \
g++ \
git \
&& rm -rf /var/lib/apt/lists/*
# PYTHONUNBUFFERED keeps Python from buffering the standard
# output stream, which means that logs can be delivered to the user quickly.
# PYTHONDONTWRITEBYTECODE keeps Python from writing the .pyc files which are unnecessary in this case.
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
ARG requirements_file_path
ARG module_path
ARG target_dir_name
COPY ${requirements_file_path} /opt/program/sagify-requirements.txt
WORKDIR /opt/program/${target_dir_name}
# Here we get all python packages.
RUN pip install flask gevent gunicorn future
RUN pip install -r ../sagify-requirements.txt && rm -rf /root/.cache
RUN python3 -m spacy download en_core_web_sm
RUN apt-get -y purge --auto-remove git
COPY ${module_path} /opt/program/${target_dir_name}
ENTRYPOINT ["sagify_base/executor.sh"]
#!/usr/bin/env bash
# Build the docker image
module_path=$1
target_dir_name=$2
dockerfile_path=$3
requirements_file_path=$4
tag=$5
image=$6
python_version=$7
docker build \
-t "${image}:${tag}" \
-f "${dockerfile_path}" . \
--build-arg module_path="${module_path}" \
--build-arg target_dir_name="${target_dir_name}" \
--build-arg requirements_file_path="${requirements_file_path}" \
--build-arg python_version="${python_version}"
#!/usr/bin/env bash
if [ $1 = "train" ]; then
python ./sagify_base/training/train
else
python ./sagify_base/prediction/serve
fi
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment