2.2. Loading taxonomy

Every data file you’ll load (eg. fasta, gff, blast) must belong to an organism. These are instructions to load the NCBI taxonomy, that contains most organisms you’ll need.

This step is optional. If you decide to skip this step, you should insert the organisms individually using the command insert_organism.

2.2.1. NCBI Taxonomy

Contains the names and phylogenetic lineages of more than 160,000 organisms that have molecular data in the NCBI databases.

URL: https://www.ncbi.nlm.nih.gov/taxonomy
File: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

After unpacking the file you’ll have the required files.

python manage.py load_organism --file names.dmp --name DB:NCBI_taxonomy
python manage.py load_phylotree --file nodes.dmp --name 'NCBI taxonomy tree' --organismdb 'DB:NCBI_taxonomy'

Loading these files can be faster if you increase the number of threads (–cpu).
It will take a long time anyway (hours).

2.2.2. Remove taxonomy

If, by any reason, you need to remove a taxonomy you should use the command remove_phylotree and remove_organisms. Most data files you’ll load depend on the organism database (eg. fasta, gff, blast). If you delete an organism database, every data file you loaded will be deleted on cascade.

python manage.py remove_phylotree --help
python manage.py remove_organisms --help

These commands require the names of the databases (Db.name)