Elasticsearch Python Client

4 min readNov 15, 2021

Introduction

Elasticsearch is a NoSQL database, based on the Apache Lucene its an open-source, distributed, modern search and analytics engine. It stores data as JSON documents. Elasticsearch allows to store, search, and analyze huge volumes of data quickly and in near real-time

Environment Setup

A trial of cloud-hosted Elasticsearch can be used as a test environment. The link below can be used to signup and create an electicsearch instance.

...

Free 14 day Trial

cloud.elastic.co

The username and password is available in a popup on deployment. Visiting the link below will give the cloud-id that needs to be used in the code.

...

Manage Deployments

cloud.elastic.co

Python Client

The client can be installed by

pip install elasticsearch

The following code establishes a connection with the cloud server and we can confirm the connection by extracting basic connection information

{'cluster_name': '2ad3dad5b81a41f5a5e499d9980b0f8f',
 'cluster_uuid': 'Bs5M_U7yRPWVMp8OJDY7UA',
 'name': 'instance-0000000000',
 'tagline': 'You Know, for Search',
 'version': {'build_date': '2021-11-04T14:04:42.515624022Z',
  'build_flavor': 'default',
  'build_hash': '93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c',
  'build_snapshot': False,
  'build_type': 'docker',
  'lucene_version': '8.9.0',
  'minimum_index_compatibility_version': '6.0.0-beta1',
  'minimum_wire_compatibility_version': '6.8.0',
  'number': '7.15.2'}}

The generic syntax is

es = Elasticsearch(host= 'locahost',http_auth=('username', 'pass'),scheme="https",port=9200)

Index

Indices are equivalent to tables. It's a collection of JSON documents. We can get all the indices in the database using

es.indices.get_alias("*")

Create an Index

A new index can be created using

{'acknowledged': True, 'index': 'test_index', 'shards_acknowledged': True}

Deleting an Index

An existing index can be deleted using

{'acknowledged': True}

Check if an index exists

False

Document & Fields

Documents are equivalents of rows and fields of columns. In elasticsearch documents are JSON objects and fields are the key-value pairs in them.

Inserting data

Data can be inserted using the following command, each document is a row and id is the row number. The index will be automatically created if it does not exist.

{'_id': '3',  
'_index': 'capitals',  
'_primary_term': 1,  
'_seq_no': 11,  
'_shards': {'failed': 0, 'successful': 2, 'total': 2},  
'_type': '_doc',  '_version': 1,  'result': 'created'}

Updating Data

{'_id': '3',
 '_index': 'capitals',
 '_primary_term': 1,
 '_seq_no': 10,
 '_shards': {'failed': 0, 'successful': 2, 'total': 2},
 '_type': '_doc',
 '_version': 3,
 'result': 'updated'}

Fetching Data

An entry can be fetched using the id as follows

{'_id': '1',  
'_index': 'capitals',  
'_primary_term': 1,  
'_seq_no': 7,  
'_source': {'Capital': 'Delhi', 'Country': 'India'},  
'_type': '_doc',  '_version': 6,  'found': True}

or just the data using,

{'Capital': 'Muscat', 'Country': 'Oman'}

Querying

We can query using the search parameter

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  'hits': {'hits': [{'_id': '1',     
'_index': 'capitals',     
'_score': 1.0,     
'_source': {'Capital': 'Delhi', 'Country': 'India'},     
'_type': '_doc'},    
{'_id': '2',     
'_index': 'capitals',     
'_score': 1.0,     
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},     
'_type': '_doc'},    
{'_id': '3',     
'_index': 'capitals',     
'_score': 1.0,     
'_source': {'Capital': 'London1', 'Country': 'England'},     '_type': '_doc'},    
{'_id': '4',     
'_index': 'capitals',     
'_score': 1.0,     
'_source': {'Capital': 'AbuDhabi', 'Country': 'UAE'},     
'_type': '_doc'}],   
'max_score': 1.0,   
'total': {'relation': 'eq', 'value': 4}},  
'timed_out': False,  
'took': 1}

We can query for a particular field,

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '1',     
'_index': 'capitals',     
'_score': 1.2039728,     
'_source': {'Capital': 'Delhi', 'Country': 'India'},     
'_type': '_doc'}],   
'max_score': 1.2039728,   
'total': {'relation': 'eq', 'value': 1}},  
'timed_out': False,  
'took': 2}

match vs match_phrase

match_phrase is a more restrictive query. To demonstrate this let's add one more entry to the capitals, Muscat-City, and let query using both match and match_phrase

#1 match returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '5',     
'_index': 'capitals',     
'_score': 1.3808408,     
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'},     '_type': '_doc'},    
{'_id': '2',     
'_index': 'capitals',     
'_score': 0.7801935,     
'_source': {'Capital': 'Muscat', 'Country': 'Oman'},     
'_type': '_doc'}],   
'max_score': 1.3808408,   
'total': {'relation': 'eq', 'value': 2}},  
'timed_out': False,  'took': 4}

#2 match_phrase returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '5',     
'_index': 'capitals',     
'_score': 1.2048804,     
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'},     '_type': '_doc'}],   
'max_score': 1.2048804,   
'total': {'relation': 'eq', 'value': 1}},  
'timed_out': False,  'took': 13}

Combining multiple queries

We can combine multiple queries using a boolean. To demonstrate this, let's add a new entry London City

#1 Regular Query return

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  'hits': {'hits': [{'_id': '5',     
'_index': 'capitals',     
'_score': 0.7104268,     
'_source': {'Capital': 'Muscat-City', 'Country': 'Oman2'},     '_type': '_doc'},    
{'_id': '6',     
'_index': 'capitals',     
'_score': 0.7104268,     
'_source': {'Capital': 'London-City', 'Country': 'England2'},     '_type': '_doc'}],   
'max_score': 0.7104268,   
'total': {'relation': 'eq', 'value': 2}},  'timed_out': False,  'took': 0}

#2 Combined query returns

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},  
'hits': {'hits': [{'_id': '6',     
'_index': 'capitals',     
'_score': 0.7104268,     
'_source': {'Capital': 'London-City', 'Country': 'England2'},     '_type': '_doc'}],   
'max_score': 0.7104268,   
'total': {'relation': 'eq', 'value': 1}},  
'timed_out': False,  'took': 1}

Asynchronous Client

To use asynchronous client, install elastic client using, this will automatically install other async libraries like aiohttp

pip install elasticsearch[async]

and an asynchronous program would be as follows