machado
  • 1. Installation
  • 2. Data loading
  • 3. Visualization
    • 3.1. Index and search
    • 3.2. Web server
    • 3.3. JBrowse
    • 3.4. Cache
  • 4. Diagrams
  • 5. Models
machado
  • 3. Visualization
  • 3.1. Index and search
  • View page source

3.1. Index and search

Haystack

The Haystack software enables the Django framework to run third party search engines such as Elasticsearch and Solr. Even though you can use any search engine supported by Haystack, machado was tested using Elasticsearch.

3.1.1. Install Elasticsearch

Elasticsearch 7.x is required. Install Java first:

sudo apt install openjdk-11-jdk

Then install Elasticsearch:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.26-amd64.deb
sudo dpkg -i elasticsearch-7.17.26-amd64.deb
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

Install the Python client inside your virtualenv:

pip install 'elasticsearch>=7,<8'

3.1.2. Enable search in machado

Uncomment the Elasticsearch settings in your .env file:

ELASTICSEARCH_URL=http://127.0.0.1:9200/
HAYSTACK_INDEX_NAME=haystack

When ELASTICSEARCH_URL is set, machado automatically adds haystack to INSTALLED_APPS and configures HAYSTACK_CONNECTIONS — no manual editing of settings.py is needed.

You can also configure which feature types are indexed:

MACHADO_VALID_TYPES=gene,mRNA,polypeptide

If MACHADO_VALID_TYPES is not set, the default is gene,mRNA,polypeptide.

3.1.3. Indexing the data

After loading data into the database, build the search index:

python manage.py rebuild_index

Note

It is necessary to run rebuild_index whenever additional data is loaded into the database or when search-related settings change.

Rebuilding the index can be faster if you increase the number of workers:

python manage.py rebuild_index -k 4

3.1.4. Increasing the results limit

The Elasticsearch server has a 10,000 results limit by default. In most cases it will not affect the results since they are paginated. The links to export .tsv or .fasta files might be truncated because of this limit. You can increase it with:

curl -XPUT "http://localhost:9200/haystack/_settings" \
     -d '{ "index" : { "max_result_window" : 500000 } }' \
     -H "Content-Type: application/json"
Previous Next

© Copyright 2018, Embrapa.

Built with Sphinx using a theme provided by Read the Docs.