2.6. Loading GFF files

The first column of a GFF file is the reference sequence ID. Usually, in order to load a GFF file, it’s required to have a reference FASTA file loaded. But some GFF files already have the reference features such as chromosome or scaffold. In this case, there are two options:

Load the GFF directly, without a reference FASTA file
Load the FASTA file and then load the GFF using the parameter ‘ignore’ to not load the reference features

The GFF file must be indexed using tabix.

2.6.1. Load GFF

python manage.py load_gff --file organism_genes_sorted.gff3.gz --organism 'Arabidopsis thaliana'

Loading this file can be faster if you increase the number of threads (–cpu).

python manage.py load_gff --help

–file	GFF3 genome file indexed with tabix (see http://www.htslib.org/doc/tabix.html) *
–organism	Species name (eg. Homo sapiens, Mus musculus) *
–ignore	List of feature types to ignore (eg. chromosome scaffold)
–doi	DOI of a reference stored using load_publication (eg. 10.1111/s12122-012-1313-4)
–qtl	Set this flag to handle GFF files from QTLDB
–cpu	Number of threads

* required fields

2.6.2. Remove file

If, by any reason, you need to remove a GFF dataset you should use the command remove_file. If you delete a file, every record that depend on it will be deleted on cascade.

python manage.py remove_file --help

This command requires the file name (Dbxrefprop.value)