Merge branch 'AITED-118_ci-pipeline' into 'main'

AITED-118: add ci pipeline See merge request !1

Merge branch 'AITED-118_ci-pipeline' into 'main'
63f1747b · Gilles Habran · 25aadc12 · 921052af · 63f1747b · 63f1747b
Commit 63f1747b authored 2 years ago by Gilles Habran
--- a/.gitignore
+++ b/.gitignore
+# Jupyter notebooks
+.ipynb_checkpoints/
+.git/
+*.pyc
+*.whl
+build/
+project.egg-info/
+__pycache__/
+venv/
+# models
+model/*
+failure/*
+failure
+model.joblib
+# vscode stuff
+.vscode/*
+!.vscode/settings.json
+!.vscode/tasks.json
+!.vscode/launch.json
+!.vscode/extensions.json
+*.code-workspace
+# pycharm stuff
+.idea
+# Local History for Visual Studio Code
+.history/
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
+image: ${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/python:3.8
+stages:
+  - build
+  - train
+  - push
+build-image:
+  stage: build
+  image:
+    name: gcr.io/kaniko-project/executor:v1.9.0-debug
+    entrypoint: [ "" ]
+  script: sh -x ci/build_push_image.sh "528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers-ci:$(/busybox/sh ci/get_config_value.sh image_name)-$CI_PIPELINE_ID"
+train:
+  stage: train
+  script: sh -x ci/train.sh
+  artifacts:
+    paths:
+      - job_result.json
+    expire_in: 30 days
+save-experiment-data:
+  stage: push
+  script: sh -x ci/save_experiment.sh "$CI_COMMIT_REF_NAME"
+  only:
+    - tags
+push-image-to-ecr:
+  stage: push
+  image:
+    name: gcr.io/kaniko-project/executor:v1.9.0-debug
+    entrypoint: [ "" ]
+  script: sh -x ci/build_push_image.sh "528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers:$(/busybox/sh ci/get_config_value.sh image_name)-$CI_COMMIT_REF_NAME"
+  only:
+    - tags
+save-model:
+  stage: push
+  script: sh -x ci/save_model.sh "$CI_COMMIT_REF_NAME"
+  only:
+    - tags
--- a/.sagify.json
+++ b/.sagify.json
+{
+    "image_name": "multi-label-division-classifier",
+    "aws_profile": "default",
+    "aws_region": "eu-west-1",
+    "python_version": "3.8",
+    "requirements_dir": "requirements.txt",
+    "sagify_module_dir": "src",
+	"experiment_id": "cpv_v0.0.1",
+    "docker_image_base_url": "python/3.8"
+}
--- a/README.md
+++ b/README.md
 # multi-label-division-classifier
+This project contains the model to predict multi label division for a given text.
+## Introduction
+The structure of the project is based on https://github.com/Kenza-AI/sagify. 
+This library is a CLI used to train and deploy ML/DL models on AWS SageMaker.
-## Getting started
+The goal is to have a local configuration that ensures a successful training (and deployment) of models with AWS SageMaker.
+Basically, if it works locally, it will work in AWS SageMaker.
-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+## Requirements
-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
+* Python 3.8, python3.8 venv -> https://www.linuxcapable.com/install-python-3-8-on-ubuntu-linux/
+* Docker
+* awscli (optional): this allows starting interacting with AWS SageMaker from a laptop. In a CI/CD pipeline like in TED AI, the GitLab runner in the environment will take core of that.
-## Add your files
+## Project structure
+This project uses Sagify https://github.com/Kenza-AI/sagify as a base for the structure of the project.
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
+Get familiar with the documentation of this tool as it won't differ much from here.
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
-```
+Some changes were made to be able to create a CI pipeline in a SRV4DEV environment. 
-cd existing_repo
+These changes can be found in the following sections.
-git remote add origin https://code.europa.eu/ted-ai/models/multi-label-division-classifier.git
-git branch -M main
+### metrics.json
-git push -uf origin main
+This file defines the regular expressions used to extract the metrics from the logs.
-```
+These metrics are then available in the Gitlab CI pipeline artifacts.
+### .sagify.json
+This file defines the properties of the project which are used by Sagify and the CI to configure, build, name the results.
+Most of these properties are used to build the Docker image: src/sagify_base/build.sh & src/sagify_base/Dockerfile
+* image_name: the name of the Docker image
+* aws_profile: the AWS profile to use when build locally
+* aws_region: the AWS region of the project, defines where the ECR can be found
+* python_version: the python version of the Docker file
+* requirements_dir: the path to the requirements.txt file
+* sagify_module_dir: the path to root directory of the sagify code
+Used by the CI
+* experiment_id: the experiment ID where the training data is located in s3://d-ew1-ted-ai-ml-data/experiments/
+* docker_image_base_url: the docker image URL of the SageMaker container -> the one that will be used by the CI and SageMaker for the real training
+### Dockerfile(s)
+There are 2 Docker files in this project:
+* src/sagify_base/Dockerfile: Dockerfile used to build and train in the local environment -> ensure correct setup of the project
+* src/sagify_base/Dockerfile-sagemaker: Dockerfile used to build and train in SageMaker. This Docker image will be pushed to Amazon ECR and used for training/inference.
+### call_endpoint.sh
+The script located at src/sagify_base/local_test/call_endpoint.sh expect the local inference container to be running
+and test the invocation.
+A successful results indicates a successful loading of the model, preprocessing of the data and prediction from the model.
+### Code
+Please refer to Sagify documentation for more information.
+* Put your training code in the function train() in src/sagify_base/training/training.py
+* Put the loading of the model and inference code in src/sagify_base/prediction/prediction.py
-## Integrate with your tools
+## CI/CD pipelines
+The CI is composed of 2 steps when started in a feature branch and 5 in a tagged version.
- [ ] [Set up project integrations](https://code.europa.eu/ted-ai/models/multi-label-division-classifier/-/settings/integrations)
+The final result of the CI pipeline of a tagged commit is:
+* a Docker image for inference in Amazon ECR sagemaker-classifiers repository
+* the model artifact in s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
+* the experiment data of the training in s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag
-## Collaborate with your team
+These artifacts must then be deployed using terraform to create SageMaker endpoint.
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
+### All pipelines
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
+#### build-image
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
+Executes the script ci/build_push_image.sh to build the Docker image for SageMaker training and push it to Amazon ECR repository sagemaker-classifiers-ci.
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
-## Test and Deploy
+To facilitate the cleanup, 2 distinct ECR repositories were created:
-Use the built-in continuous integration in GitLab.
+* sagemaker-classifiers-ci: contains temporary Docker images for CI needs (feature branches) - automatically removed after some time
+* sagemaker-classifiers: contains the real images that will be used for inference - not removed
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
+#### train
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
+Executes ci/train.sh; this uses the Docker image previously built for training in SageMaker.
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
-***
+This job generates artifacts in Gitlab CI with metrics and other information.
-# Editing this README
+Once the result is satisfying, create a merge request.
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
+### Only tagged pipeline
+The following steps are only executed on tagged commit.
-## Suggestions for a good README
+#### save-model
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
+Executes ci/save_model.sh and copy the model artifact from temporary bucket to s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz
-## Name
+These models must not be deleted, they will be retrieved by SageMaker endpoint and combined with the Docker image for inference when deploying an endpoint.
-Choose a self-explaining name for your project.
-## Description
+#### save-experiment-data
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
+Executes ci/save_experiment.sh and copy the experiment data from "s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id" to "s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
-## Badges
+This ensures that to keep a history of the data that was used for building and training the classifier.
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
-## Visuals
+The configuration and path is extracted from .sagify.json with the property *experiment_id*.
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
-## Installation
+#### push-image-to-ecr
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
+Executes ci/build_push_image.sh and push the Docker image in sagemaker-classifiers for training.
-## Usage
+These images are not automatically deleted.
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
-## Support
+## Setup local environment
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
-## Roadmap
+```shell
-If you have ideas for releases in the future, it is a good idea to list them in the README.
+$ python3 -m venv venv
+$ source venv/bin/activate
+$ pip3 install -r requirements.txt
+```
+## Test local environment setup
+The goal is to be able to build a Docker image, simulate a SageMaker training and create an local inference endpoint.
+This maximizes a successful SageMaker configuration for the building of a model.
-## Contributing
+### Build
-State if you are open to contributions and what your requirements are for accepting them.
+The build command creates a Docker image that can be used by SageMaker for training&inference.
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
+```shell
+$ sagify build
+```
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
+Expect result:
+![Sagify build - result](docs/sagify_build.png)
-## Authors and acknowledgment
+### Local train
-Show your appreciation to those who have contributed to the project.
+The command sagify build must be executed successfully before starting a local training.
-## License
+```shell
-For open source projects, say how it is licensed.
+$ sagify local train
+```
+Expect result:
+![Sagify local train - result](docs/sagify_local_train.png)
+### Local deploy
+After a successful training, the inference container can be started with:
+```shell
+$ sagify local deploy
+```
+Expected result:
+![Sagify local deploy - result](docs/sagify_local_deploy.png)
+To check the logs, find the container name and tail the logs
+```shell
+# find container name of your image
+$ docker ps
+# tail the logs using the container name found in NAMES column (eg: random_name)
+$ docker tail -f random_name
+```
+### All in one
+During the development of the project, all the commands can be chained
+```shell
+sagify build && sagify local train && sagify local deploy
+```
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
--- a/ci/build_push_image.sh
+++ b/ci/build_push_image.sh
+#!/usr/bin/env bash
+image_url=$1
+requirements_dir=$(/busybox/sh ci/get_config_value.sh requirements_dir)
+sagify_module_dir=$(/busybox/sh ci/get_config_value.sh sagify_module_dir)
+echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
+/kaniko/executor \
+  --single-snapshot \
+  --context "${CI_PROJECT_DIR}" \
+  --dockerfile "$sagify_module_dir/sagify_base/Dockerfile-sagemaker" \
+  --destination "$image_url" \
+  --build-arg module_path="$sagify_module_dir" \
+  --build-arg target_dir_name="$sagify_module_dir" \
+  --build-arg requirements_file_path="$requirements_dir" \
+  --build-arg docker_image_base_url="${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/python:3.8" \
--- a/ci/get_config_value.sh
+++ b/ci/get_config_value.sh
+#!/usr/bin/env bash
+set -euo pipefail +x
+field_name=$1
+grep -Eo "\"$field_name\" *: *\"(.*)\"" .sagify.json | grep -Eo ':.*".*"' | cut -d'"' -f 2
--- a/ci/save_experiment.sh
+++ b/ci/save_experiment.sh
+#!/usr/bin/env bash
+python3 -m pip install awscli
+git_tag=$1
+if [ -z "$git_tag" ]; then
+  echo "Error when trying to retrieve the git tag to save the experiment. Cannot continue."
+  exit 1
+fi
+image_name=$(bash ci/get_config_value.sh image_name)
+if [ -z "$image_name" ]; then
+  echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
+  exit 1
+fi
+experiment_id=$(bash ci/get_config_value.sh experiment_id)
+if [ -z "$experiment_id" ]; then
+  echo "The config 'experiment_id' in .sagify.json must be set. Cannot continue."
+  exit 1
+fi
+s3_training_data_prefix="s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id"
+s3_data_artifacts_prefix="s3://d-ew1-ted-ai-ml-data/data-artifacts/$image_name/$git_tag"
+echo "Copying data from '$s3_training_data_prefix/*' to '$s3_data_artifacts_prefix/'"
+aws s3 sync "$s3_training_data_prefix" "$s3_data_artifacts_prefix"
--- a/ci/save_model.sh
+++ b/ci/save_model.sh
+#!/usr/bin/env bash
+python3 -m pip install awscli
+apt update && apt install jq --no-install-recommends -y
+git_tag=$1
+if [ -z "$git_tag" ]; then
+  echo "Error when trying to retrieve the git tag to save the experiment. Cannot continue."
+  exit 1
+fi
+image_name=$(bash ci/get_config_value.sh image_name)
+if [ -z "$image_name" ]; then
+  echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
+  exit 1
+fi
+s3_model_source_path=$(jq .ModelArtifacts.S3ModelArtifacts job_result.json)
+if [ -z "$s3_model_source_path" ]; then
+  echo "The value of 's3_model_source_path' could not be extracted from job_result.json file. Cannot continue."
+  exit 1
+fi
+s3_model_source_path_cleaned=$(echo "$s3_model_source_path" | tr -d '"')
+s3_model_destination_path="s3://d-ew1-ted-ai-ml-models/models/$image_name/$git_tag/model.tar.gz"
+echo "Copying data from '$s3_model_source_path_cleaned' to '$s3_model_destination_path'"
+aws s3 cp "$s3_model_source_path_cleaned" "$s3_model_destination_path"
--- a/ci/train.sh
+++ b/ci/train.sh
+#!/usr/bin/env bash
+python3 -m pip install awscli
+image_name=$(bash ci/get_config_value.sh image_name)
+if [ -z "$image_name" ]; then
+  echo "The config 'image_name' in .sagify.json must be set. Cannot continue."
+  exit 1
+fi
+experiment_id=$(bash ci/get_config_value.sh experiment_id)
+if [ -z "$experiment_id" ]; then
+  echo "The config 'experiment_id' in .sagify.json must be set. Cannot continue."
+  exit 1
+fi
+s3_training_data_prefix="s3://d-ew1-ted-ai-ml-data/experiments/$experiment_id"
+datetime=$(date '+%Y-%m-%d-%H-%M-%S')
+image_url="528719223857.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-classifiers-ci:$image_name-$CI_PIPELINE_ID"
+s3_output_prefix=s3://d-ew1-ted-ai-ml-models/models-ci
+metrics="$(cat metrics.json)"
+# aws s3 cp "$s3_training_data_prefix/hyperparameters.json" /tmp/hyperparameters.json
+hyperparams="$(cat src/sagify_base/local_test/test_dir/input/config/hyperparameters.json)"
+job_arn=$(
+  aws sagemaker create-training-job \
+    --training-job-name "$image_name-$datetime" \
+    --hyper-parameters "$hyperparams" \
+    --algorithm-specification '{"TrainingImage":"'"$image_url"'","TrainingInputMode":"File","MetricDefinitions":'"$metrics"'}' \
+    --role-arn "arn:aws:iam::528719223857:role/sagemaker_notebooks" \
+    --input-data-config '{"ChannelName":"training","DataSource":{"S3DataSource":{"S3DataType":"S3Prefix","S3Uri":"'"$s3_training_data_prefix"'","S3DataDistributionType":"FullyReplicated"}}}' \
+    --output-data-config "S3OutputPath=$s3_output_prefix" \
+    --resource-config "InstanceType=ml.m5.large,InstanceCount=1,VolumeSizeInGB=30" \
+    --stopping-condition "MaxRuntimeInSeconds=86400" \
+    --query TrainingJobArn \
+    --region eu-west-1 \
+    --output text
+)
+job_name=$(echo "$job_arn" | cut -d/ -f 2)
+aws sagemaker wait training-job-completed-or-stopped --region eu-west-1 --training-job-name "$job_name"
+aws sagemaker describe-training-job --region eu-west-1 --training-job-name  "$job_name" > job_result.json
--- a/docs/sagify_build.png
+++ b/docs/sagify_build.png
--- a/docs/sagify_local_deploy.png
+++ b/docs/sagify_local_deploy.png
--- a/docs/sagify_local_train.png
+++ b/docs/sagify_local_train.png
--- a/metrics.json
+++ b/metrics.json
+[
+  {
+    "Name": "f1",
+    "Regex": "f1=(.*);"
+  },
+  {
+    "Name": "roc_auc",
+    "Regex": "roc_auc=(.*);"
+  },
+  {
+    "Name": "accuracy",
+    "Regex": "accuracy=(.*);"
+  },
+  {
+    "Name": "coverage_err",
+    "Regex": "coverage_err=(.*);"
+  },
+  {
+    "Name": "label_ranking_average_precision",
+    "Regex": "label_ranking_average_precision=(.*);"
+  }
+]
--- a/requirements.txt
+++ b/requirements.txt
+attrs==22.2.0
+blis==0.7.9
+boto3==1.26.108
+botocore==1.29.108
+catalogue==2.0.8
+certifi==2022.12.7
+charset-normalizer==3.1.0
+click==8.0.4
+confection==0.0.4
+cymem==2.0.7
+dill==0.3.6
+docker==5.0.3
+future==0.18.3
+google-pasta==0.2.0
+idna==3.4
+importlib-metadata==6.1.0
+Jinja2==3.1.2
+jmespath==1.0.1
+joblib==1.2.0
+langcodes==3.3.0
+MarkupSafe==2.1.2
+multiprocess==0.70.14
+murmurhash==1.0.9
+numpy==1.24.2
+packaging==23.0
+pandas==2.0.0
+pathos==0.3.0
+pathy==0.10.1
+pox==0.3.2
+ppft==1.7.6.6
+preshed==3.0.8
+protobuf==3.20.3
+protobuf3-to-dict==0.1.5
+pydantic==1.10.7
+python-dateutil==2.8.2
+pytz==2023.3
+requests==2.28.2
+s3transfer==0.6.0
+sagemaker==2.72.3
+sagify==0.23.0
+scikit-learn==1.2.2
+scipy==1.10.1
+six==1.16.0
+smart-open==6.3.0
+smdebug-rulesconfig==1.0.1
+spacy==3.5.1
+spacy-legacy==3.0.12
+spacy-loggers==1.0.4
+srsly==2.4.6
+thinc==8.1.9
+threadpoolctl==3.1.0
+tqdm==4.65.0
+typer==0.7.0
+typing_extensions==4.5.0
+tzdata==2023.3
+Unidecode==1.3.6
+urllib3==1.26.15
+wasabi==1.1.1
+websocket-client==1.5.1
+zipp==3.15.0
--- a/src/__init__.py
+++ b/src/__init__.py
--- a/src/sagify_base/Dockerfile
+++ b/src/sagify_base/Dockerfile
+ARG python_version
+FROM python:$python_version-slim-buster
+LABEL maintainer="Kenza AI <support@kenza.ai>"
+RUN apt-get -y update && apt-get install -y --no-install-recommends \
+         make \
+         nginx \
+         ca-certificates \
+         g++ \
+         git \
+    && rm -rf /var/lib/apt/lists/*
+# PYTHONUNBUFFERED keeps Python from buffering the standard
+# output stream, which means that logs can be delivered to the user quickly. 
+# PYTHONDONTWRITEBYTECODE keeps Python from writing the .pyc files which are unnecessary in this case. 
+ENV PYTHONUNBUFFERED=TRUE
+ENV PYTHONDONTWRITEBYTECODE=TRUE
+ENV PATH="/opt/program:${PATH}"
+ARG requirements_file_path
+ARG module_path
+ARG target_dir_name
+COPY ${requirements_file_path} /opt/program/sagify-requirements.txt
+WORKDIR /opt/program/${target_dir_name}
+# Here we get all python packages.
+RUN pip install flask gevent gunicorn future
+RUN pip install -r ../sagify-requirements.txt && rm -rf /root/.cache
+RUN python3 -m spacy download en_core_web_sm
+RUN apt-get -y purge --auto-remove git
+COPY ${module_path} /opt/program/${target_dir_name}
+ENTRYPOINT ["sagify_base/executor.sh"]
--- a/src/sagify_base/Dockerfile-sagemaker
+++ b/src/sagify_base/Dockerfile-sagemaker
+ARG docker_image_base_url
+FROM $docker_image_base_url
+LABEL maintainer="Kenza AI <support@kenza.ai>"
+RUN apt-get -y update && apt-get install -y --no-install-recommends \
+         make \
+         nginx \
+         ca-certificates \
+         g++ \
+         git \
+    && rm -rf /var/lib/apt/lists/*
+# PYTHONUNBUFFERED keeps Python from buffering the standard
+# output stream, which means that logs can be delivered to the user quickly. 
+# PYTHONDONTWRITEBYTECODE keeps Python from writing the .pyc files which are unnecessary in this case. 
+ENV PYTHONUNBUFFERED=TRUE
+ENV PYTHONDONTWRITEBYTECODE=TRUE
+ENV PATH="/opt/program:${PATH}"
+ARG requirements_file_path
+ARG module_path
+ARG target_dir_name
+COPY ${requirements_file_path} /opt/program/sagify-requirements.txt
+WORKDIR /opt/program/${target_dir_name}
+# Here we get all python packages.
+RUN pip install flask gevent gunicorn future
+RUN pip install -r ../sagify-requirements.txt && rm -rf /root/.cache
+RUN python3 -m spacy download en_core_web_sm
+RUN apt-get -y purge --auto-remove git
+COPY ${module_path} /opt/program/${target_dir_name}
+ENTRYPOINT ["sagify_base/executor.sh"]
--- a/src/sagify_base/__init__.py
+++ b/src/sagify_base/__init__.py
--- a/src/sagify_base/build.sh
+++ b/src/sagify_base/build.sh
+#!/usr/bin/env bash
+# Build the docker image
+module_path=$1
+target_dir_name=$2
+dockerfile_path=$3
+requirements_file_path=$4
+tag=$5
+image=$6
+python_version=$7
+docker build \
+-t "${image}:${tag}" \
+-f "${dockerfile_path}" . \
+--build-arg module_path="${module_path}" \
+--build-arg target_dir_name="${target_dir_name}" \
+--build-arg requirements_file_path="${requirements_file_path}" \
+--build-arg python_version="${python_version}"
--- a/src/sagify_base/executor.sh
+++ b/src/sagify_base/executor.sh
+#!/usr/bin/env bash
+if [ $1 = "train" ]; then
+    python ./sagify_base/training/train
+else
+    python ./sagify_base/prediction/serve
+fi
\ No newline at end of file