# Unified Data Structure db scrips

Contains:
* script to import file data to db
* db queries Python/R &JSON files or embedded in python or R scripts
* sample code to use that

## Install

* Read and apply [first-time git setup instructions](./GIT_SETUP.md).
* Clone git repo: 

  ```bash
  git clone https://jeodpp.jrc.ec.europa.eu/apps/gitlab/use_cases/legent/uds-scripts.git
  ```
* Create a new secrets.py file and paste the db credentials that will be provided to you by your db admin
* Run this bash command to append scripts-path into $PATH and $PYTHONPATH:

  ```bash
  source ./install.sh
  ```
* Create and activate a new virtual environment using the environment.yml file:

  ```bash
  conda env create -f environment.yml
  conda activate uds-scripts
  ```
  For more info on managing Conda environments click [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
  For the jeodpp pycharm/conda documentation click here [here](https://jeodpp.jrc.ec.europa.eu/apps/gitlab/for-everyone/documentation/-/wikis/howto/jeodesk/Set-up-pycharm-with-conda-virtual-environment)
* Setup Conda virtual environment as your PyCharm interpreter. Follow the steps:
  File -> Settings -> Project -> Python Interpreter -> Press gear symbol and select “Add” -> Conda Environment -> Click "Existing environment" -> Select uds-scripts

## Usage

* CD to a dir where you are going to work,
  inside your home-dir, eg: `/home/{user}/{foo}`
  (this may be a new one)
* --(process db-data)--
  * run scripts & queries to get the data from the db (and possibly create files)
    * BASH-HISTORY is your friend
  * modify script-files
  * update db (only for users with adb admin credentials)
* copy/move specific files for the NextCloud agent to pick them up,
  from:
  
  ```
  /home/{user}/Documents/{foo} --> /home/{user}/{foo}/Documents
                               --> /eos/jeodpp/home/users/{user}/{bar}
                               --> /eos/jeodpp/data/projects/LEGENT/transfer/{baz}
  ```

### Sample code

```python
from db import eea_2020_flattened
from utils import save_to_parquet
from config import Config
import pandas as pd

df = pd.DataFrame.from_records(eea_2020_flattened.find())

...  # process your data

save_to_parquet(df, Config.PUBLIC_SAVE_PATH, 'test.parquet')
save_to_excel(df, Config.PUBLIC_SAVE_PATH, 'test.xlsx')
```

> **TIP:** Don't store your pandas in CSVs, they are big, slow and loose precision.
> 
> Use excel-files when sharing data, they are also big, but keep precision are not very slow.  Or preferably, store them in parquet.
> 
> Maybe it's not the best idea to use out bdap's home-ds for experimenting with big-data:
> https://jeodpp.jrc.ec.europa.eu/apps/gitlab/for-everyone/documentation/-/wikis/howto/data/Access-and-transfer-files#storage-places-not-to-be-used-for-data-storage

### NextCloud mapping of folders

```
BDAP                                           PC-folder of NextCloud
=========================================      =========================
/home/{user}/Documents                    <--> ~/BDAPCloud/data
/eos/jeodpp/home/users/{user}             <--> ~/BDAPCloud/eos_user_home
/eos/jeodpp/data/projects/LEGENT/transfer <--> ~/BDAPCloud/eos_LEGENT
```