The following are updated steps on setting up the latest production-ready versions of MyTardis (v4.3.0) and MyData (MyData_v0.3.2-98-g0149bd8) on a virtual machine with Ubuntu 18.04. This is an update from the MyTardis and You and also features steps for setting up PostgreSQL as the backend database.

While my example implementation is empty of data, it is up and running here.

MyTardis Framework

The following resources were referenced for this blog:

Several other resources are also hyperlinked throughout the remainder of this blog.

Setup your Domain Name

First, set up your @ and www custom resource records under the DNS for your domain name. Part 6 of my blog post on Cloud Platforms, CentOS 7, and Ghost provides some detail on how to accomplish this. Google Domains has since updated their layout so the screen shots will differ from reality, but the essence is still the same.

Create a Sudo User & Enable Firewall

## while logged in as root
adduser tardis
usermod -aG sudo tardis

## now give tardis root privledges (if you need to)
visudo
## this opens nano, a text editor
## under 'root ALL=(ALL:ALL)ALL', add:
tardis ALL=(ALL:ALL)ALL

## now ctrl-x and select y to save changes

Enable your firewall

ufw app list
## Allow SSH connections:
ufw allow OpenSSH
## enable the firewall by typing:
ufw enable
## check the status of allowed ports:
ufw status

Pull MyTardis & MyData & install dependencies

cd /
su - tardis
## using sudo for the first time will prompt for the user password
sudo ls
[enter password]

## MyTardis
sudo git clone -b master https://github.com/mytardis/mytardis.git
cd mytardis
# install MyTardis requirements
sudo bash install-ubuntu-py3-requirements.sh

## mydata app
cd tardis/apps/
sudo git clone https://github.com/mytardis/mytardis-app-mydata mydata

## move up a directory
cd ..

## python, postgressql, nginx, curl
sudo apt install python3-pip python3-dev libpq-dev postgresql postgresql-contrib nginx curl -y

## install certbot 
sudo add-apt-repository ppa:certbot/certbot
[enter]
sudo apt install python-certbot-nginx -y

Setup PostgresSQL

## login to postgres with the postgres user
sudo -u postgres psql
## enter the following psql commands to create the db
CREATE DATABASE tardis_db;

## add a password
CREATE USER tardis WITH PASSWORD 'yourpassword';

## set tardis time zone settings
ALTER ROLE tardis SET client_encoding TO 'utf8';
ALTER ROLE tardis SET default_transaction_isolation TO 'read committed';
ALTER ROLE tardis SET timezone TO 'UTC';

## give privileges to tardis to work with the db
GRANT ALL PRIVILEGES ON DATABASE tardis_db TO tardis;
ALTER USER tardis CREATEDB;

## quit out of psql
\q

Install virtualenvwrapper

A virtual environment is a workspace that has everything you need pre-installed inside of it. Think about how nice it would be to quickly toggle between multiple projects with different dependencies and settings. That's the utility of virtualenv for Python. Virtualenvwrapper and virtualenv need the sudo -H flag with pip when not the root user.
## virtualenvwrapper pip3 installation
sudo -H pip3 install --upgrade pip
sudo -H pip3 install virtualenvwrapper
sudo -H pip3 install virtualenv

Add a few lines to the shell startup file ~/.bashrc

The lines that are added into your ~/.bashrc shell startup file essentially tell your computer where your virtualenvs are stored and what version of Python you want your virtualenvs to use.

As an alternative, you could create custom /.profile shell startup files that use different versions of Python or different virtualenvs storage locat/.profile shell startup files that use different versions of Python or different virtualenvs storage locations. You can think of your ~/.profile as the custom settings you would have to load prior to initializing a virtualenv. For now, we're just going to use ~/.bashrc.
sudo vim ~/.bashrc

Add lines to ~/.bashrc

export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Devel
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh
# Write and quit VIM
:wq

Run the shell startup file

source ~/.bashrc

Activate your project's virtual environment

mkvirtualenv mytardis
To work with any virtualenv after creation, run the command: workon [YourEnvName] e.g., workon mytardis

Pip install the requirements for MyTardis

pip install -U pip
pip install -U -r requirements.txt
pip install -U -r requirements-postgres.txt

Pip install the requirements for MyData

## cd into the mydata app
cd tardis/apps/mydata
pip install -U -r requirements.txt

Install JavaScript dependencies for production and for testing

npm install && npm test

If you show javascript related errors on deployment of MyTardis, re-run npm install && npm test after setting up nginx and letsencrypt.

Create MyData receiving and complete default directories

su tardis
sudo mkdir mytardis/mydata_storage/receiving
sudo mkdir mytardis/mydata_storage/complete

Create a new file, tardis/settings.py

MyTardis has default settings, but these can be overwritten by any custom settings you place in this tardis/settings.py file. For example, if you want to use PostgreSQL instead of sqlite3, you can specify that information in this file. Once you have a copy of MyTardis up and running with no issues, you can change DEBUG = False; initially, however, I would recommend leaving it as True. If there is an internal issue with the database or server settings, the developer will be able to see what the error message is with DEBUG = True. While False, the developer will simply be navigated to a generic load error web page with no revealing information. See Django Settings Documentation for some additional Django settings, or the Extended configuration settings from MyTardis documentation.

sudo vim tardis/settings.py

Fill the file with the following settings:

from .default_settings import *
from pathlib import Path
import os

# Build paths inside the project like this: BASE_DIR / 'subdir'.
BASE_DIR = Path(__file__).resolve().parent.parent

# Add site specific changes here
ALLOWED_HOSTS = ['your_domain.com', 'www.your_domain.com', 'localhost']
#ALLOWED_HOSTS = ['server_ip_address', 'localhost']

# additional applications
INSTALLED_APPS += ('tardis.apps.mydata',)

# Turn on django debug mode.
DEBUG = True

# Swapped to postgressql
# The database needs to be named something other than "tardis" to avoid
# a conflict with a directory of the same name.
DATABASES['default']['ENGINE'] = 'django.db.backends.postgresql_psycopg2'
DATABASES['default']['NAME'] = 'tardis_db'
DATABASES['default']['USER'] = 'tardis'
DATABASES['default']['PASSWORD'] = 'yourpassword'
DATABASES['default']['HOST'] = 'localhost'

# It is recommended to pass protocol information to Gunicorn. Many web frameworks use this
# information to generate URLs. Without this information, the application may mistakenly
# generate ‘http’ URLs in ‘https’ responses, leading to mixed content warnings or broken applications.
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

# Next, move down tot he bottom of the file and add a setting indicating where the static files should
# be placed. This is necessary so that Nginx can handle requests of these items. The following line tells
# Django to place them in a directory called static in the base project directory:
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'static/')

## MyData settings
DEFAULT_RECEIVING_DIR = os.path.join(BASE_DIR,'mydata_storage/receiving/')
DEFAULT_STORAGE_BASE_DIR = os.path.join(BASE_DIR, 'mydata_storage/complete/')
# Write and quit VIM
:wq

Default permissions

To prevent any potential permission issues, set the ownership of the mytardis workspace to the tardis:tardis sudo username.

cd /

sudo chown -R tardis:tardis mytardis/

If there are persistent permission issues, you may need to recheck permissions with ls -l. At the most, you could sudo chmod -R 755 mytardis/, which would recursively set permissions to all files for users to read and execute.

Next, you will need to create a SECRET_KEY.

The MyTardis documentation states that this key "is important for security reasons". Python must be active for this command to run workon mytardis. It is a short script that appends a random 50 character key to the bottom of your settings.py file:

sudo python -c "import os; from random import choice; key_line = '%sSECRET_KEY=\"%s\"  # generated from build.sh\n' % ('from .default_settings import * \n\n' if not os.path.isfile('tardis/settings.py') else '', ''.join([choice('abcdefghijklmnopqrstuvwxyz0123456789@#%^&*(-_=+)') for i in range(50)])); f=open('tardis/settings.py', 'a+'); f.write(key_line); f.close()"

Initialization

Create and configure the MyTardis database tables:

python manage.py migrate
python manage.py createcachetable default_cache
python manage.py createcachetable celery_lock_cache

Migrate MyData database tables:

python manage.py makemigrations mydata
python manage.py migrate mydata

Load MyData's default experiment schema:

python manage.py loaddata tardis/apps/mydata/fixtures/default_experiment_schema.json

Next, create a superuser:

python manage.py createsuperuser

Test the deployment of MyTardis

The following command will allow you to navigate via web browser to your MyTardis deployment.

## Create an exception for port 8000 by typing:
sudo ufw allow 8000
python manage.py runserver 0.0.0.0:8000

Gunicorn

If you were able to access the MyTardis deployment, ctrl+c out and test gunicorn's ability to serve up MyTardis:

cd mytardis
gunicorn --bind 0.0.0.0:8000 wsgi

Check if website is still running with gunicorn -- if yes, CTRL-C

deactivate

Creating systemd Socket and Service Files for Gunicorn

We have tested that Gunicorn can interact with our Django application, but we should implement a more robust way of starting and stopping the application server. To accomplish this, we’ll make systemd service and socket files. The Gunicorn socket will be created at boot and will listen for connections. When a connection occurs, systemd will automatically start the Gunicorn process to handle the connection.

Create a gunicorn.socket file

sudo vim /etc/systemd/system/gunicorn.socket

Fill the file with the following settings:

[Unit]
Description=gunicorn socket

[Socket]
ListenStream=/run/gunicorn.sock

[Install]
WantedBy=sockets.target
# Write and quit VIM
:wq

Create a gunicorn.service file

sudo vim /etc/systemd/system/gunicorn.service

Fill the file with the following settings:

[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target

[Service]
User=tardis
Group=tardis
WorkingDirectory=/mytardis
ExecStart=/home/tardis/.virtualenvs/mytardis/bin/gunicorn \
          --access-logfile - \
          --workers 3 \
          --bind unix:/run/gunicorn.sock \
          wsgi:application

[Install]
WantedBy=multi-user.target
# Write and quit VIM
:wq

Enable and test the gunicorn.socket

# We can now start and enable the Gunicorn socket. 
sudo systemctl start gunicorn.socket
sudo systemctl enable gunicorn.socket

# Checking for the Gunicorn Socket File
sudo systemctl status gunicorn.socket
# Testing Socket Activation
sudo systemctl status gunicorn

# == html if working
curl --unix-socket /run/gunicorn.sock localhost

sudo systemctl status gunicorn

# If the output from curl or the output of systemctl status indicates that a problem occurred, check the logs for additional details:
sudo journalctl -u gunicorn

Create a celeryworker.service file

Celery workers run asynchronous MyTardis tasks, which are essentially tasks that can run in the background and be prioritized based on website use.

sudo vim /etc/systemd/system/celeryworker.service
[Unit]
Description=celeryworker daemon
After=network.target

[Service]
User=tardis
Group=tardis
WorkingDirectory=/mytardis
Environment=DJANGO_SETTINGS_MODULE=tardis.settings
ExecStart=/home/tardis/.virtualenvs/mytardis/bin/celery worker \
  -A tardis.celery.tardis_app \
  -c 2 -Q celery,default -n "allqueues.%%h"

[Install]
WantedBy=multi-user.target
# Write and quit VIM
:wq
## reload the daemon to reread the service definition and restart gunicorn:
sudo systemctl daemon-reload
sudo systemctl restart gunicorn

Configure Nginx with Gunicorn and LetsEncrypt

Create a file to configure Nginx for your domain (HTTP initially)

sudo vim /etc/nginx/sites-available/your_domain.com

Fill the file with the following settings:

upstream mytardis {
    server unix:/run/gunicorn.sock;
    server 0.0.0.0:8000 backup;
}
server {
    listen 80;
    #server_name 127.0.0.1;
    server_name your_domain.com www.your_domain.com;
    return 301 https://$server_name$request_uri;
}
server {
    listen 443 default_server ssl;
    #server_name 127.0.0.1;
    server_name your_domain.com www.your_domain.com;

    # HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
    add_header Strict-Transport-Security max-age=15768000;

    client_max_body_size 4G;
    keepalive_timeout 5;

    gzip off;  # security reasons
    gzip_proxied any;
    # MyTardis generates uncompressed archives, so compress them in transit
    gzip_types application/x-javascript text/css;
    gzip_min_length 1024;
    gzip_vary on;

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://mytardis;
        client_max_body_size 4G;
        client_body_buffer_size 8192k;
        proxy_connect_timeout 2000;
        proxy_send_timeout 2000;
        proxy_read_timeout 2000;
    }

    location /static/ {
        expires 7d;
        alias /mytardis/static/;
    }
}
# Write and quit VIM
:wq

Create a hard link ln between the file you just created and those listed as enabled by nginx:

sudo ln -s /etc/nginx/sites-available/[YourWebsite].com /etc/nginx/sites-enabled
sudo nginx -t
sudo systemctl restart nginx

sudo ufw delete allow 8000
sudo ufw allow 'Nginx Full'

Setting up HTTPS with Certbot and Nginx

sudo nginx -t
sudo systemctl reload nginx

sudo ufw status

ufw status printout should look like the following:

OpenSSH                   ALLOW          Anywhere
Nginx Full                  ALLOW          Anywhere
OpenSSH (v6)           ALLOW          Anywhere (v6)
Nginx Full (v6)          ALLOW          Anywhere (v6)
sudo certbot --nginx -d your_domain.com -d www.your_domain.com
[enter email]
[A]
[n]
[2]

Step 5 — Verifying Certbot Auto-Renewal

sudo certbot renew --dry-run

Collect static files to settings.STATIC_ROOT

workon mytardis
python manage.py collectstatic
deactivate

In order for Django to work, there needs to be a collection of "static" files within a directory. Static files are essentially copies of the original CSS and HTML. If this directory is not specified in the Nginx configuration file, then your pages will load with no CSS while DEBUG = True. If DEBUG = False, you will just be redirected to a generic error page.

sudo nginx -t
sudo systemctl reload nginx
sudo systemctl restart nginx

All done! You now have an iteration of MyTardis running from your domain. To see what I ran based on this tutorial, please navigate to: spatialmsk.dev

Setting up MyData

MyData provides a tutorial on setting up a demo of the MyData app, the general steps were as follows:

  • Verify the MyData app is accessible via admin; there will be a new section called "MYDATA" with UploaderRegistrationRequests, UploaderSettings, and Uploaders as subheadings
  • Verify the experiment default schema from mydata loaddata uploaded by selecting the Schema subheading under "TARDIS_PORTAL"
  • Create a testfacility user account and give it the mydata-default-permissions (automatically created when mydata is migrated and installed)
  • Select Api keys under the "TASTYPIE" heading, "Add an Api key", search for the testfacility user, then click save and an API Key will automatically generate
  • Create 'Test Facility' under the facilities subheading and give it the mydata-default-permissions manager group
  • Create 'Test Microscope' under the instruments subheading and associate it with 'Test Facility'
  • Create "testuser1" and "testuser2" user accounts for testing
  • Download the MyData test data

Within the MyData Desktop application, the following settings were used:

Instrument Name: Test Microscope
Facility Name: Test Facility
Contact Name: Some Name Here
Contact Email: Some Email Here
Data Directory: the\directory\of\the\test\data\MyTardisDemoUsers
MyTardis URL: https://spatialmsk.dev
MyTardis Username: testfacility
MyTardis API Key: automatically-generated-api-key-goes-here

A 'Test Run' can be done to see what will happen, or the data could just be sent by clicking on the play button 'Scan and Upload'. MyData will only upload data that has not yet been uploaded. This can also be set within the settings. Various filters and set times for uploading can also be set.

Additional steps coming soon ...

MyData test upload to MyTardis web application

MyData test upload to MyTardis web application

MyTardis web application post upload

MyTardis web application post upload

Setting up MyTardis Schemas

Setting up filters and mytardis_ngs_ingestor

USE_FILTERS indicates whether metadata filters should be used.

POST_SAVE_FILTERS contains a list of post save filters that are execute when a new data file is created.

sudo vim mytardis/tardis/settings.py
USE_FILTERS = True
POST_SAVE_FILTERS = [("tardis.tardis_portal.filters.exif.EXIFFilter", ["EXIF", "http://exif.schema"]),]
# Write and quit VIM
:wq

The following are set under admin:

== default storage box settings ==
max size: 9510137856
Name: default /mytardis/mydata_storage/complete
Description: Default Storage for uploads via SCP
Master box: default
== storage box options ==
location: /mytardis/mydata_storage/complete
== storage box attributes ==
type: default
scp_username: mydata
scp_hostname: spatialmsk.dev

== receiving storage box settings ==
max size: 9510137856
Name: Local box at /mytardis/mydata_storage/receiving
Description: Temporary storage for uploads via SCP
Master box: default
== storage box options ==
location: /mytardis/mydata_storage/receiving
== storage box attributes ==
type: receiving
scp_username: mydata
scp_hostname: spatialmsk.dev