Advertisement
  1. Code
  2. Python
  3. Django

How to Index and Query Data With Haystack and Elasticsearch in Python

Scroll to top
Read Time: 8 min

Haystack

Haystack is a Python library that provides modular search for Django. It features an API that provides support for different search back ends such as Elasticsearch, Whoosh, Xapian, and Solr.

Elasticsearch

Elasticsearch is a popular Lucene search engine capable of full-text search, and it's developed in Java.

Google search uses the same approach of indexing their data, and that's why it's very easy to retrieve any information with just a few keywords, as shown below.

Elastic Search and GoogleElastic Search and GoogleElastic Search and Google

Install Django Haystack and Elasticsearch

The first step is to get Elasticsearch up and running locally on your machine. Elasticsearch requires Java, so you need to have Java installed on your machine.

We are going to follow the instructions from the Elasticsearch site.

Download the Elasticsearch 1.4.5 tar as follows:

1
curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.4.5.tar.gz

Extract it as follows:

1
tar -xvf elasticsearch-1.4.5.tar.gz

It will then create a batch of files and folders in your current directory. We then go into the bin directory as follows:

1
cd elasticsearch-1.4.5/bin

Start Elasticsearch as follows.

1
./elasticsearch

To confirm if it has installed successfully, go to http://127.0.0.1:9200/, and you should see something like this.

1
{
2
  "name" : "W3nGEDa",
3
  "cluster_name" : "elasticsearch",
4
  "cluster_uuid" : "ygpVDczbR4OI5sx5lzo0-w",
5
  "version" : {
6
    "number" : "5.6.3",
7
    "build_hash" : "1a2f265",
8
    "build_date" : "2017-10-06T20:33:39.012Z",
9
    "build_snapshot" : false,
10
    "lucene_version" : "6.6.1"
11
  },
12
  "tagline" : "You Know, for Search"
13
}

Ensure you also have haystack installed.

1
pip install django-haystack

Let's create our Django project. Our project will be able to index all the customers in a bank, making it easy to search and retrieve data using just a few search terms.

1
django-admin startproject Bank

This command creates files that provide configurations for Django projects.

Let's create an app for customers.

1
cd Bank
2
3
python manage.py startapp customers

settings.py Configurations

In order to use Elasticsearch to index our searchable content, we’ll need to define a back-end setting for haystack in our project's settings.py file. We are going to use Elasticsearch as our back end.

HAYSTACK_CONNECTIONS is a required setting and should look like this:

1
HAYSTACK_CONNECTIONS = {
2
    'default': {
3
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
4
        'URL': 'http://127.0.0.1:9200/',
5
        'INDEX_NAME': 'haystack',
6
    },
7
}

Within the settings.py, we are also going to add haystack and customers to the list of installed apps.

1
INSTALLED_APPS = [
2
    'django.contrib.admin',
3
    'django.contrib.auth',
4
    'django.contrib.contenttypes',
5
    'django.contrib.sessions',
6
    'django.contrib.messages',
7
    'django.contrib.staticfiles',
8
    'rest_framework',
9
    'haystack',
10
    'customer'
11
]

Create Models

Let's create a model for Customers. In customers/models.py, add the following code.

1
from __future__ import unicode_literals
2
3
from django.db import models
4
5
6
# Create your models here.

7
customer_type = (
8
    ("Active", "Active"),
9
    ("Inactive", "Inactive")
10
)
11
12
13
class Customer(models.Model):
14
    id = models.IntegerField(primary_key=True)
15
    first_name = models.CharField(max_length=50, null=False, blank=True)
16
    last_name = models.CharField(
17
        max_length=50, null=False, blank=True)
18
    other_names = models.CharField(max_length=50, default=" ")
19
    email = models.EmailField(max_length=100, null=True, blank=True)
20
    phone = models.CharField(max_length=30, null=False, blank=True)
21
    balance = models.IntegerField(default="0")
22
    customer_status = models.CharField(
23
        max_length=100, choices=customer_type, default="Active")
24
    address = models.CharField(
25
        max_length=50, null=False, blank=False)
26
27
    def save(self, *args, **kwargs):
28
        return super(Customer, self).save(*args, **kwargs)
29
30
    def __unicode__(self):
31
        return "{}:{}".format(self.first_name, self.last_name)

Register your Customer model in admin.py like this:

1
from django.contrib import admin
2
from .models import Customer
3
4
# Register your models here.

5
6
admin.site.register(Customer)

Create Database and Super User

Apply your migrations and create an admin account.

1
python manage.py migrate
2
python manage.py createsuperuser

Run your server and navigate to http://localhost:8000/admin/. You should now be able to see your Customer model there. Go ahead and add new customers in the admin.

Indexing Data

To index our models, we begin by creating a SearchIndex. SearchIndex objects determine what data should be placed in the search index. Each type of model must have a unique searchIndex.

SearchIndex objects are the way haystack determines what data should be placed in the search index and handles the flow of data in. To build a SearchIndex, we are going to inherit from the indexes.SearchIndex and indexes.Indexable, define the fields we want to store our data with, and define a get_model method.

Let's create the CustomerIndex to correspond to our Customer modeling. Create a file search_indexes.py in the customers app directory, and add the following code.

1
from .models import Customer
2
from haystack import indexes
3
4
5
class CustomerIndex(indexes.SearchIndex, indexes.Indexable):
6
    text = indexes.EdgeNgramField(document=True, use_template=True)
7
    first_name = indexes.CharField(model_attr='first_name')
8
    last_name = indexes.CharField(model_attr='last_name')
9
    other_names = indexes.CharField(model_attr='other_names')
10
    email = indexes.CharField(model_attr='email', default=" ")
11
    phone = indexes.CharField(model_attr='phone', default=" ")
12
    balance = indexes.IntegerField(model_attr='balance', default="0")
13
    customer_status = indexes.CharField(model_attr='customer_status')
14
    address = indexes.CharField(model_attr='address', default=" ")
15
16
    def get_model(self):
17
        return Customer
18
19
    def index_queryset(self, using=None):
20
        return self.get_model().objects.all()

The EdgeNgramField is a field in the haystack SearchIndex that prevents incorrect matches when parts of two different words are mashed together.

It allows us to use the autocomplete feature to conduct queries. We will use autocomplete when we start querying our data.

document=True indicates the primary field for searching within. Additionally, the  use_template=True in the text field allows us to use a data template to build the document that will be indexed.

Let's create the template inside our customers template directory. Inside   search/indexes/customers/customers_text.txt, add the following:

1
{{object.first_name}}
2
{{object.last_name}}
3
{{object.other_names}}

Reindex Data

Now that our data is in the database, it's time to put it in our search index. To do this, simply run ./manage.py rebuild_index. You’ll get totals of how many models were processed and placed in the index.

1
Indexing 20 customers

Alternatively, you can use RealtimeSignalProcessor, which automatically handles updates/deletes for you. To use it, add the following in the settings.py file.

1
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

Querying Data

We are going to use a search template and the Haystack API to query data.

Search Template

Add the haystack urls to your URLconf.

1
url(r'^search/', include('haystack.urls')),

Let's create our search template. In templates/search.html, add the following code.

1
{% block head %}
2
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
3
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
4
<script src="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
5
6
{% endblock %}
7
{% block navbar %}
8
 <nav class="navbar navbar-default">
9
  <div class="container">
10
    <div class="navbar-header">
11
      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
12
        <span class="icon-bar"></span>
13
        <span class="icon-bar"></span>
14
        <span class="icon-bar"></span>
15
      </button>
16
      <a class="navbar-brand" href="#">HOME</a>
17
    </div>
18
    <div class="collapse navbar-collapse" id="myNavbar">
19
      <ul class="nav navbar-nav navbar-right">
20
        <li><input type="submit" class="btn btn-primary" value="Add Customer">  </li>
21
    </ul>
22
    </div>
23
  </div>
24
</nav>
25
{% endblock %}
26
{% block content %}
27
<div class="container-fluid bg-3 text-center">  
28
<form method="get" action="." class="form" role="form">
29
        {{ form.non_field_errors }}
30
        <div class="form-group">
31
                {{ form.as_p }}
32
        </div>
33
        <div class="form-group">
34
            <input type="submit" class="btn btn-primary" value="Search">
35
        </div>
36
37
        {% if query %}
38
            <h3>Results</h3>
39
              
40
        <div class="container-fluid bg-4 text-left">    
41
                <div class="row">
42
    
43
                    {% for result in page.object_list %}
44
                       
45
                <div class="col-sm-4">
46
                  <div class="thumbnail">
47
                             
48
                    <div class="form-group">
49
                        <p>First name : {{result.first_name}} </p>
50
                    </div>
51
52
                    <div class="form-group">
53
                        <p>Last name : {{result.last_name}} </p>
54
                        
55
                    </div>
56
57
                    <div class="form-group">
58
                        <p>Balance : {{result.balance}} </p>
59
                    </div>
60
                    <div class="form-group">
61
                        <p>Email : {{result.email}} </p>
62
                    </div>
63
                    <div class="form-group">
64
                        <p>Status : {{result.customer_status}} </p>
65
                    </div>
66
                  </div>
67
                </div>
68
                {% empty %}
69
                    
70
                   <p style="text-center">No results found.</p>
71
                    {% endfor%}
72
                </div>
73
        </div>   
74
           
75
        {% endif %}
76
</form>
77
</div>
78
79
{% endblock %}

The page.object_list is a list of SearchResult objects that allows us to get the individual model objects, for example, result.first_name.

Your complete project structure should look something like this:

The project directory structureThe project directory structureThe project directory structure

Now run server, go to 127.0.0.1:8000/search/, and do a search as shown below.

Running a search on a local serverRunning a search on a local serverRunning a search on a local server

A search of Albert will give results of all customers with the name Albert. If no customer has the name Albert, then the query will give empty results. Feel free to play around with your own data.

Haystack API

Haystack has a SearchQuerySet class that is designed to make it easy and consistent to perform searches and iterate results. Much of the SearchQuerySet API is familiar with Django’s ORM QuerySet.

In customers/views.py, add the following code:

1
from django.shortcuts import render
2
from rest_framework.decorators import (
3
    api_view, renderer_classes,
4
)
5
from .models import Customer
6
from haystack.query import SearchQuerySet
7
8
from rest_framework.response import Response
9
# Create your views here.

10
11
12
@api_view(['POST'])
13
def search_customer(request):
14
    name = request.data['name']
15
    customer = SearchQuerySet().models(Customer).autocomplete(
16
        first_name__startswith=name)
17
18
    searched_data = []
19
    for i in customer:
20
        all_results = {"first_name": i.first_name,
21
                       "last_name": i.last_name,
22
                       "balance": i.balance,
23
                       "status": i.customer_status,
24
                       }
25
        searched_data.append(all_results)
26
27
    return Response(searched_data)

autocomplete is a shortcut method to perform an autocomplete search. It must be run against fields that are either EdgeNgramField or NgramField.

In the above Queryset, we are using the contains method to filter our search to retrieve only the results that contain our defined characters. For example, Al will only retrieve the details of the customers which contain Al. Note that the results will only come from fields that have been defined in the customer_text.txt file.

The results of a queryThe results of a queryThe results of a query

Apart from the contains Field Lookup, there are other fields available for performing queries, including:

  • content
  • contains
  • exact
  • gt
  • gte
  • lt
  • lte
  • in
  • startswith
  • endswith
  • range
  • fuzzy

Conclusion

A huge amount of data is produced at any given moment in social media, health, shopping, and other sectors. Much of this data is unstructured and scattered. Elasticsearch can be used to process and analyze this data into a form that can be understood and consumed.

Elasticsearch has also been used extensively for content search, data analysis, and queries. For more information, visit the Haystack and Elasticsearch sites.

Advertisement
Did you find this post useful?
Want a weekly email summary?
Subscribe below and we’ll send you a weekly email summary of all new Code tutorials. Never miss out on learning about the next big thing.
Advertisement
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.