Code development platform for open source projects from the European Union institutions :large_blue_circle: EU Login authentication by SMS has been phased out. To see alternatives please check here

Skip to content
Snippets Groups Projects
README.md 6.56 KiB
Newer Older
  • Learn to ignore specific revisions
  • Gilles Habran's avatar
    Gilles Habran committed
    # multi-label-division-classifier
    
    
    This project contains the model to predict multi label division for a given text.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ## Introduction
    The structure of the project is based on https://github.com/Kenza-AI/sagify. 
    This library is a CLI used to train and deploy ML/DL models on AWS SageMaker.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    The goal is to have a local configuration that ensures a successful training (and deployment) of models with AWS SageMaker.
    Basically, if it works locally, it will work in AWS SageMaker.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ## Requirements
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    * Python 3.8, python3.8 venv -> https://www.linuxcapable.com/install-python-3-8-on-ubuntu-linux/
    * Docker
    * awscli (optional): this allows starting interacting with AWS SageMaker from a laptop. In a CI/CD pipeline like in TED AI, the GitLab runner in the environment will take core of that.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ## Project structure
    This project uses Sagify https://github.com/Kenza-AI/sagify as a base for the structure of the project.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    Get familiar with the documentation of this tool as it won't differ much from here.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    Some changes were made to be able to create a CI pipeline in a SRV4DEV environment. 
    These changes can be found in the following sections.
    
    ### metrics.json
    This file defines the regular expressions used to extract the metrics from the logs.
    
    These metrics are then available in the Gitlab CI pipeline artifacts.
    
    ### .sagify.json
    This file defines the properties of the project which are used by Sagify and the CI to configure, build, name the results.
    
    Most of these properties are used to build the Docker image: src/sagify_base/build.sh & src/sagify_base/Dockerfile
    * image_name: the name of the Docker image
    * aws_profile: the AWS profile to use when build locally
    * aws_region: the AWS region of the project, defines where the ECR can be found
    * python_version: the python version of the Docker file
    * requirements_dir: the path to the requirements.txt file
    * sagify_module_dir: the path to root directory of the sagify code
    
    Used by the CI
    * experiment_id: the experiment ID where the training data is located in s3://d-ew1-ted-ai-ml-data/experiments/
    * docker_image_base_url: the docker image URL of the SageMaker container -> the one that will be used by the CI and SageMaker for the real training
    
    ### Dockerfile(s)
    There are 2 Docker files in this project:
    
    * src/sagify_base/Dockerfile: Dockerfile used to build and train in the local environment -> ensure correct setup of the project
    * src/sagify_base/Dockerfile-sagemaker: Dockerfile used to build and train in SageMaker. This Docker image will be pushed to Amazon ECR and used for training/inference.
    
    ### call_endpoint.sh
    The script located at src/sagify_base/local_test/call_endpoint.sh expect the local inference container to be running
    and test the invocation.
    
    A successful results indicates a successful loading of the model, preprocessing of the data and prediction from the model.
    
    ### Code
    Please refer to Sagify documentation for more information.
    
    * Put your training code in the function train() in src/sagify_base/training/training.py
    * Put the loading of the model and inference code in src/sagify_base/prediction/prediction.py
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ## CI/CD pipelines
    The CI is composed of 2 steps when started in a feature branch and 5 in a tagged version.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    The final result of the CI pipeline of a tagged commit is:
    * a Docker image for inference in Amazon ECR sagemaker-classifiers repository
    * the model artifact in s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
    * the experiment data of the training in s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    These artifacts must then be deployed using terraform to create SageMaker endpoint.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ### All pipelines
    #### build-image
    
    Executes the script ci/build_push_image.sh to build the Docker image for SageMaker training and push it to Amazon ECR repository ci-temporary-images.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    To facilitate the cleanup, 2 distinct ECR repositories were created:
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    * ci-temporary-images: contains temporary Docker images for CI needs (feature branches) - automatically removed after some time
    
    * sagemaker-classifiers: contains the real images that will be used for inference - not removed
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    #### train
    Executes ci/train.sh; this uses the Docker image previously built for training in SageMaker.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    This job generates artifacts in Gitlab CI with metrics and other information.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    Once the result is satisfying, create a merge request.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ### Only tagged pipeline
    The following steps are only executed on tagged commit.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    #### save-model
    Executes ci/save_model.sh and copy the model artifact from temporary bucket to s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    These models must not be deleted, they will be retrieved by SageMaker endpoint and combined with the Docker image for inference when deploying an endpoint.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    #### save-experiment-data
    Executes ci/save_experiment.sh and copy the experiment data from "s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id" to "s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    This ensures that to keep a history of the data that was used for building and training the classifier.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    The configuration and path is extracted from .sagify.json with the property *experiment_id*.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    #### push-image-to-ecr
    Executes ci/build_push_image.sh and push the Docker image in sagemaker-classifiers for training.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    These images are not automatically deleted.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ## Setup local environment
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ```shell
    $ python3 -m venv venv
    $ source venv/bin/activate
    $ pip3 install -r requirements.txt
    ```
    
    ## Test local environment setup
    The goal is to be able to build a Docker image, simulate a SageMaker training and create an local inference endpoint.
    This maximizes a successful SageMaker configuration for the building of a model.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ### Build
    The build command creates a Docker image that can be used by SageMaker for training&inference.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ```shell
    $ sagify build
    ```
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    Expect result:
    ![Sagify build - result](docs/sagify_build.png)
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ### Local train
    The command sagify build must be executed successfully before starting a local training.
    
    Gilles Habran's avatar
    Gilles Habran committed
    
    
    ```shell
    $ sagify local train
    ```
    Expect result:
    ![Sagify local train - result](docs/sagify_local_train.png)
    
    ### Local deploy
    After a successful training, the inference container can be started with:
    
    ```shell
    $ sagify local deploy
    ```
    
    Expected result:
    ![Sagify local deploy - result](docs/sagify_local_deploy.png)
    
    To check the logs, find the container name and tail the logs
    ```shell
    # find container name of your image
    $ docker ps
    # tail the logs using the container name found in NAMES column (eg: random_name)
    $ docker tail -f random_name
    ```
    
    ### All in one
    During the development of the project, all the commands can be chained
    
    ```shell
    sagify build && sagify local train && sagify local deploy
    ```
    
    Gilles Habran's avatar
    Gilles Habran committed