Code development platform for open source projects from the European Union institutions

Skip to content
Snippets Groups Projects
Robert RIEMANN's avatar
Robert RIEMANN authored
Build Container image in CI for every commit and publish new version for tags

See merge request !130
08e213f3
History
Website Evidence Collector Logo

Website Evidence Collector

The tool Website Evidence Collector (WEC) automates the website evidence collection of storage and transfer of personal data. It is based on the browser Chromium/Chrome and its JavaScript software library for automation puppeteer.

Table of Contents

⚡⚡ Quick Start

First, make sure Node.js (minimum version is 20) and npm1 are installed. Check running node -v or install it by following the guide on the Node.js website. Linux users can also use their package manager (e.g., apt install nodejs). Check Repology for your distribution.

Second, install the latest version of the Website Evidence Collector.

$ npm install --global https://code.europa.eu/EDPS/website-evidence-collector/-/releases/permalink/latest/downloads/website-evidence-collector.tgz

Third, run a collection.

$ website-evidence-collector https://example.com

Lastly, uninstall the tool using:

$ npm uninstall --global website-evidence-collector

Screencast Installation

Troubleshooting: Permissions

If you encounter permission denied errors during installation, try the following commands:

mkdir "${HOME}/.npm-packages"
npm config set prefix "${HOME}/.npm-packages"

Run Website Evidence Collector

The WEC can be run in two ways. Either using the collect command on the command line, saving its output in a folder or using the serve command starting a webserver which can be accessed using the browser. The serve command is recommended for quick and simple scans.

Notice on the Processing of Personal Data: This tool carries out automated processing of data of websites for the purpose of identifying their processing of personal data. If you run the tool to visit web pages containing personal data, this tool will download, display, and store these personal data in the form of text files and screenshots, and you will therefore process personal data.

Hint: If you run into command not found errors you have to add the .npm-packages to your PATH.
Run the following commands:

NPM_PACKAGES="${HOME}/.npm-packages"  
export PATH="$PATH:$NPM_PACKAGES/bin"

You can check your PATH with this command: echo $PATH.

serve

Screencast Call

The serve command starts a local web server to display the collected evidence. By default, the website is available at http://localhost:8080/.

$ website-evidence-collector serve

You can customize the server port and browser options:

  • Use -p to specify a different port.
  • Use --browser-options to pass additional options to the internal Chromium browser.

Example with custom port and browser options:

website-evidence-collector serve -p 8081 --browser-options='--disable-webgl' --browser-options='--disable-gpu'

collect

Screencast Call

The collect command is the default command for WEC when no other options are provided. It runs a collection from the terminal and saves the result in the output folder by default.

Basic Usage

$ website-evidence-collector https://example.com

Options

1. Simple output on the terminal only:
$ website-evidence-collector --no-output --yaml https://example.com 2> /dev/null

This displays the output on the terminal and redirects logging to /dev/null.

2. Ignore certificate errors during collection:
$ website-evidence-collector -y -q https://untrusted-root.badssl.com -- --ignore-certificate-errors

This ignores certificate errors when collecting data from the specified URL.

All command line arguments after -- (the second in case of npm) are applied to launch Chromium.

Reference: https://peter.sh/experiments/chromium-command-line-switches/#ignore-certificate-errors

Integrate with testssl.sh:

Note: Testssl.sh v3.0 or higher must be already installed. The most recent and with WEC tested version is v3.0.6.

With the option --testssl, the website evidence collector calls testssl.sh to gather information about the HTTPS/SSL connection.

a. Basic usage:

$ website-evidence-collector --testssl https://example.com

b. Specify testssl.sh executable location:

$ website-evidence-collector -q --testssl-executable ../testssl.sh-3.0.6/testssl.sh https://example.com

c. Use a pre-existing testssl.sh JSON output file:

$ website-evidence-collector --testssl-file example-testssl.json https://example.com

🐋 Using Docker or Podman

Docker/Podman containers are available under https://code.europa.eu/EDPS/website-evidence-collector/container_registry.

  • To run the WEC server, forward the port:

     $ docker run -p 8080:8080 code.europa.eu:4567/edps/website-evidence-collector:latest
  • To collect evidence and save output, map a volume:

    $ docker run -v /path/on/your/system:/output:z --userns=keep-id code.europa.eu:4567/edps/website-evidence-collector:latest collect https://example.com
  • Or build your own image using the Containerfile:

     $ docker build -t website-evidence-collector -f Containerfile
  • The container accepts the version of testssl.sh used through the environment variable TESTSSL_VERSION.

Frequently Asked Questions

Please find a collection of frequently asked questions with answers in FAQ.md

Setup of the Development Environment

  1. Install the dependencies according to the Installation Guide point 1.
  2. Clone the Repository using Git
    $ git clone https://code.europa.eu/EDPS/website-evidence-collector.git`.
  3. Open the terminal and navigate to the folder website-evidence-collector.
  4. Install the dependencies and compile TypeScript
    $ npm install
    $ npm run install-frontend-dependencies
    $ npm build
  5. Consider to use npm link to make the command website-evidence-collector available outside the project folder.

TODO List

Third-Party Software

The following software extends WEC to cover further use cases. It is developed independently of the WEC and is not tested or approved by the WEC developers.

Resources for Developers

Contributors

License

This work, excluding filter lists, is distributed under the European Union Public Licence (the ‘EUPL’). Please find the terms in the file LICENSE.txt.

Filter lists in the assets/ directory are authored by the EasyList authors (https://easylist.to/) and are for your convenience distributed together with this work under their respective license as indicated in their file headers.

  1. npm stands for Node.js package manager.