Mytardis and You

Mytardis and You

It's been a while and I haven't been keeping up to date with converting successful VM steps to blog posts. My most recent trial was putting an iteration of MyTardis up on a virtual machine with Ubuntu 18.04. As I've said in the past - Digital Ocean is wonderful and what I use. If you want to know more about some options (mainly Amazon Web Services vs. Digital Ocean), see my previous blog entry. Probably the thing that has won me over, are the thorough variety of tutorials that Digital Ocean provides (which are featured prominently throughout my blog posts). I wouldn't have gotten very far without the help of Digital Ocean and I hope they continue to carve out a piece of the market so that I can continue to benefit from their amazing tutorials.

I am currently working on an eDNA grant, and the collaborators have been seeking a platform that can manage their data and metadata in such a way that is publishable and intuitive. MyTardis is a "data management platform" whose primary market appear to be researchers whose needs encompass data storage, collaboration, and metadata tracking. It's open source, written in Python, and backed by your choice of database. This tutorial comes with the default SQLite, but I will be changing it to PostgreSQL. My formal database education revolved around database design and modeling, featuring Oracle. Oracle is a database management software commonly used by large companies. What Oracle is not, however, is free and open source. PostgreSQL is all of the above, while also supporting spatial queries and spatial data (PostGIS). It didn't take long for me to move to PostgreSQL for my data management needs.

MyTardis also comes packed with the ability for users to setup a Secure File Transfer Protocol (SFTP) to the database. That means the data that is meant to stay private, will remain private to all but approved users. For an example implementation, the MyTardis folks also put together an example Sequencing Facility. I'm partially through their development documentation, and so far it's relatively well written and contained most of the information I needed for deployment.

Some of their deployment/installation steps needed additional background or steps, hence this blog post. Where there were missing steps (or steps that did not work), I filled in information from other sources (such as Digital Ocean). While my example implementation is empty of data, it is up and running here.

MyTardis Framework

The following resources were referenced for this blog:

Several other resources are also hyperlinked throughout the remainder of this blog.

Installing MyTardis in a Digital Ocean Ubuntu 18.04 Droplet

# Make a local copy of the MyTardis GitHub repository
git clone -b master https://github.com/mytardis/mytardis.git
# Navigate into the MyTardis directory
cd mytardis
# Run the installation shell script from the MyTardis directory
sudo bash install-ubuntu-py3-requirements.sh

Install virtualenvwrapper

A virtual environment is a workspace that has everything you need pre-installed inside of it. Think about how nice it would be to quickly toggle between multiple projects with different dependencies and settings. That's the utility of virtualenv for Python.
sudo pip3 install virtualenvwrapper

Add a few lines to the shell startup file ~/.bashrc

The lines that are added into your ~/.bashrc shell startup file essentially tell your computer where your virtualenvs are stored and what version of Python you want your virtualenvs to use.

As an alternative, you could create custom /.profile shell startup files that use different versions of Python or different virtualenvs storage locat/.profile shell startup files that use different versions of Python or different virtualenvs storage locations. You can think of your ~/.profile as the custom settings you would have to load prior to initializing a virtualenv. For now, we're just going to use ~/.bashrc.
mkdir $HOME/.virtualenvs
sudo vim ~/.bashrc

Add lines to ~/.bashrc

export WORKON_HOME=$HOME/.virtualenvs
export PROJECT_HOME=$HOME/Devel
export VIRTUALENV_PYTHON=/usr/bin/python3
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/bin/virtualenv
source /usr/local/bin/virtualenvwrapper.sh
# Write and quit VIM
:wq

Run the shell startup file

source ~/.bashrc

Activate your project's virtual environment

mkvirtualenv mytardis
To work with any virtualenv after creation, run the command: workon [YourEnvName] e.g., workon mytardis

Pip install all of the requirements for MyTardis

sudo pip install -U pip
sudo pip install -U -r requirements.txt

Install JavaScript dependencies for production and for testing

npm install && npm test

Create a new file, tardis/settings.py

MyTardis has default settings, but these can be overwritten by any custom settings you place in this tardis/settings.py file. For example, if you want to use PostgreSQL instead of sqlite3, you can specify that information in this file. Once you have a copy of MyTardis up and running with no issues, you can change DEBUG = False; initially, however, I would recommend leaving it as True. If there is an internal issue with the database or server settings, the developer will be able to see what the error message is with DEBUG = True. While False, the developer will simply be navigated to a generic load error web page with no revealing information. See Django Settings Documentation for some additional Django settings, or the Extended configuration settings from MyTardis documentation.

sudo vim tardis/settings.py

Fill the file with the following settings:

from .default_settings import *

# Add site specific changes here. The simplest case: 
# just add the domain name(s) and IP addresses of your Django server
# ALLOWED_HOSTS = [ 'example.com', 'YourIPAddress']
# To respond to 'example.com' and any subdomains, start the domain with a dot
# ALLOWED_HOSTS = ['.example.com', 'YourIPAddress']
ALLOWED_HOSTS = ['YourIPAddress', 'YourDomain.com', '.YourDomain.com', 'localhost']

# Turn on django debug mode.
DEBUG = True

# Use the built-in SQLite database for testing.
# The database needs to be named something other than "tardis" to avoid
# a conflict with a directory of the same name.
DATABASES['default']['ENGINE'] = 'django.db.backends.sqlite3'
DATABASES['default']['NAME'] = 'tardis_db'

# It is recommended to pass protocol information to Gunicorn. 
# Many web frameworks use this information to generate URLs. Without 
# this information, the application may mistakenly generate ‘http’ 
# URLs in ‘https’ responses, leading to mixed content warnings or 
# broken applications.
SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

# Next, move down tot he bottom of the file and add a setting 
# indicating where the static files should be placed. This is
# necessary so that Nginx can handle requests of these items. 
# The following line tells Django to place them in a directory
# called static in the base project directory:
STATIC_URL = '/static/'
STATIC_ROOT = '/root/mytardis/static/'
# Write and quit VIM
:wq

Next, you will need to create a SECRET_KEY.

The MyTardis documentation states that this key "is important for security reasons". Python must be active for this command to run workon mytardis. It is a short script that appends a random 50 character key to the bottom of your settings.py file:

python -c "import os; from random import choice; key_line = '%sSECRET_KEY=\"%s\"  # generated from build.sh\n' % ('from .default_settings import * \n\n' if not os.path.isfile('tardis/settings.py') else '', ''.join([choice('abcdefghijklmnopqrstuvwxyz0123456789@#%^&*(-_=+)') for i in range(50)])); f=open('tardis/settings.py', 'a+'); f.write(key_line); f.close()"

Initialization

Create and configure the database:

python manage.py migrate
python manage.py createcachetable default_cache
python manage.py createcachetable celery_lock_cache

Next, create a superuser:

python manage.py createsuperuser

MyTardis can now be executed using:

python manage.py runserver 0.0.0.0:8000

Testing Gunicorn’s Ability to Serve the Project

cd mytardis
gunicorn --bind 0.0.0.0:8000 wsgi

Check if website is still running with gunicorn -- if yes, CTRL-C

deactivate

Creating systemd Socket and Service Files for Gunicorn

We have tested that Gunicorn can interact with our Django application, but we should implement a more robust way of starting and stopping the application server. To accomplish this, we’ll make systemd service and socket files. The Gunicorn socket will be created at boot and will listen for connections. When a connection occurs, systemd will automatically start the Gunicorn process to handle the connection.

Create a gunicorn.socket file

sudo vim /etc/systemd/system/gunicorn.socket

Fill the file with the following settings:

[Unit]
Description=gunicorn socket

[Socket]
ListenStream=/run/gunicorn.sock

[Install]
WantedBy=sockets.target
# Write and quit VIM
:wq

Create a gunicorn.service file

sudo vim /etc/systemd/system/gunicorn.service

Fill the file with the following settings:

[Unit]
Description=gunicorn daemon
Requires=gunicorn.socket
After=network.target

[Service]
User=root
Group=www-data
WorkingDirectory=/root/mytardis
ExecStart=/root/.virtualenvs/mytardis/bin/gunicorn \
          --access-logfile - \
          --workers 3 \
          --bind unix:/run/gunicorn.sock \
          wsgi:application

[Install]
WantedBy=multi-user.target
# Write and quit VIM
:wq

Enable and test the gunicorn.socket

sudo systemctl enable gunicorn.socket

sudo systemctl start gunicorn.socket
sudo systemctl status gunicorn.socket
sudo systemctl status gunicorn
curl --unix-socket /run/gunicorn.sock localhost
sudo systemctl status gunicorn

sudo journalctl -u gunicorn

sudo systemctl daemon-reload
sudo systemctl restart gunicorn

Installing Nginx

sudo apt install nginx -y

Create a file to configure Nginx for your domain (HTTP initially)

sudo vim /etc/nginx/sites-available/[YourWebsite].com

Fill the file with the following settings:

upstream mytardis {
    server unix:/run/gunicorn.sock;
    server 0.0.0.0:8000 backup;
}
server {
    listen 80;
    server_name [YourWebsite].com www.[YourWebsite].com;

    client_max_body_size 4G;
    keepalive_timeout 5;

    gzip off;  # security reasons
    gzip_proxied any;
    # MyTardis generates uncompressed archives, so compress them in transit
    gzip_types application/x-javascript text/css;
    gzip_min_length 1024;
    gzip_vary on;

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://mytardis;
        client_max_body_size 4G;
        client_body_buffer_size 8192k;
        proxy_connect_timeout 2000;
        proxy_send_timeout 2000;
        proxy_read_timeout 2000;
    }

# The ``location /static/`` is really important, because a copy of your CSS 
# and stuff are saved in a different directory. If they're NOT, then your CSS 
# will not load. So you tell Nginx where to find the static files HERE,
# and you tell Django in the settings.py file.

    location /static/ {
        expires 7d;
        alias /root/mytardis/static/;
    }

    # HSTS (ngx_http_headers_module is required) (15768000 seconds = 6 months)
    add_header Strict-Transport-Security max-age=15768000;
}
# Write and quit VIM
:wq

Create a hard link ln between the file you just created and those listed as enabled by nginx:

sudo ln -s /etc/nginx/sites-available/[YourWebsite].com /etc/nginx/sites-enabled
sudo nginx -t
sudo systemctl restart nginx

sudo ufw delete allow 8000
sudo ufw allow 'Nginx Full'

Setting up HTTPS with Certbot and Nginx

sudo add-apt-repository ppa:certbot/certbot
sudo apt install python-certbot-nginx -y

sudo nginx -t
sudo systemctl reload nginx

sudo ufw status

ufw status printout should look like the following:

OpenSSH                   ALLOW          Anywhere
Nginx Full                  ALLOW          Anywhere
OpenSSH (v6)           ALLOW          Anywhere (v6)
Nginx Full (v6)          ALLOW          Anywhere (v6)
sudo certbot --nginx -d [YourWebsite].com -d www.[YourWebsite].com
[enter email]
[2]

Step 5 — Verifying Certbot Auto-Renewal

sudo certbot renew --dry-run

Collect static files to settings.STATIC_ROOT

workon mytardis
python manage.py collectstatic
deactivate

In order for Django to work, there needs to be a collection of "static" files within a directory. Static files are essentially copies of the original CSS and HTML. If this directory is not specified in the Nginx configuration file, then your pages will load with no CSS while DEBUG = True. If DEBUG = False, you will just be redirected to a generic error page.

sudo nginx -t
sudo systemctl reload nginx
sudo systemctl restart nginx

All done! You now have an iteration of MyTardis running from your domain. To see what I ran based on this tutorial, please navigate to: spatialmsk.dev