28. July 2020   Alfonso Noriega

Vind: Elastic road

Once upon a time, before society melt down due to a global pandemic crisis, an amazing idea about easing developers life popped out and therefore Vind (faɪnd) was born (Read more about Vind and those happy times in the official repository and here).

Under the concepts of :

  • Simplicity
    “No expert? No problem! Which means: easy things should be easy to do.” – by Thomas Kurz
  • Agnosticism
    “The lib thereby has the goal to be backend agnostic…”  – by Thomas Kurz
  • Reusability

a bunch of people closed themself together in a room (at the time that was completely legitimate), cooked up a set of ideas, and started to walk down the road to Vind, the Java library meant to make our lives easier when implementing an application using/providing information discovery.

Due to a set of circumstances, requirements, and skill set,  Apache Solr was the backend chosen to be implemented, and after some hellish version updates, bug fixes and some other great successful stories the time came to challenge one of the main concepts: backend agnosticism.

As the tendency / need is now to move in the direction of SaaS/Paas the next step in Vind way was to implement an alternative to the Solr backend with a technology providing competitive and reliable options as Saas as well as broadly extended and well experienced in full text search; therefore the ElasticSearch backend conception was arised and new path open ahead for Vind further development.

Elastic VS Solr

Although it may be pictured as a big challenge, the fact of ElasticSearch and Solr sharing a Lucene background, should provide the insurance of being able to reach the same end, and accomplishing the implementation of the same functionalities with both backends.

But let’s not celebrate just yet as their API approaches are completely different:

  • Schema VS mapping: Basic configuration is not only different but while Solr is based on classic XML files Elastic relies on Json configurations.
  • Collection VS index: A change in naming in basic concepts, exposes that Vind API was originally implemented with the Solr example in mind.
  • Facets VS aggregations: Aggregations in Elasticsearch are json defined and follow an approach close to json facets added in Solr 5; but completely different to the Vind implementation. 
  • Filter VS Query DSL: Elasticsearch implements a Json based query DSL which, when used to Solr query syntax, seems to add lots of verbosity and complexity to queries.
  • Atomic updates VS Script based updates: Script based operations allow you to perform updates in Elasticsearch documents.

At first glance, the previous differences would create the image of a bumpy road towards the successful implementation of the ElasticSearch backend, nevertheless the REST API, plus the Java client provided by Elastic support developers greatly in their tasks and gives a feeling of a modern, consistent API when implementing Elasticsearch clients.

What is already in place?

At the current development status of Vind (version 3.0.0a-RC5),  Elasticsearch backend has not yet reached all of its Solr counterpart backend functionalities but It has reached a level of service where it can be used for most of the information discovery applications. 

The current implemented functionalities are:

  • CRUD operations including partial updates (remove regex operation not supported).
  • Full text search
  • Filters
  • Facets (Stats facets on text based fields not supported)
  • Suggestions
  • Contextualized fields
  • Pojo annotations

So any application already using find which sticks to the previously listed features could easily switch to index on an elastic search instance by just changing the dependencies on their project from the previous Solr (embedded or remote) backend to the elastic-backend and changing its configuration (either by config file, env variables or programmatically as described in Vinddoc). 

<dependency>
   <groupId>com.rbmhtechnology.vind</groupId>
   <artifactId>elastic-server</artifactId>
   <version>3.0.0a-RC5</version>
</dependency>

Maven dependency
runtime group: 'com.rbmhtechnology.vind', name: 'elastic-server', version:’3.0.0a-RC5’ 

Gradle dependency

Changing configuration programmatically:

SearchConfiguration.set(SearchConfiguration.SERVER_PROVIDER, "com.rbmhtechnology.vind.elasticsearch.backend.ElasticServerProvider");
SearchConfiguration.set(SearchConfiguration.SERVER_HOST, "http://localhost:9200");
SearchConfiguration.set(SearchConfiguration.SERVER_COLLECTION, "vind");

Note: a running elastic search server has to be set up separately and be reachable, either as docker image, or an existing service.

What is new?

Parallel indexing

When talking about migration between backends on a live system it may not be as trivial as just changing the technology; reindexing all the existing documents, especially if there is a massive amount of them, may take days. 

During this indexing time the application normally shouldn’t be shut down and new documents may come in which must not be skipped in the new backend. To easify this transaction, Vind now includes a MasterSlaveServer implementation which takes two different search servers (Master and Slave), indexing the incoming documents in both of them, while just reading from the master search server.

To avoid performance decrease in the running application, the indexing in the slave is made asynchronous.

As well as for preparing the transition, this MasterSlaveServer approach can help on a roll back to the previous server if we keep as master the new backend, and as slave the old one during a test period.

private SearchServer getSolrServer() {
   SearchConfiguration.set(SearchConfiguration.SERVER_PROVIDER, "com.rbmhtechnology.vind.solr.backend.RemoteSolrServerProvider");
   SearchConfiguration.set(SearchConfiguration.SERVER_HOST, "http://localhost:8983/solr");
   SearchConfiguration.set(SearchConfiguration.SERVER_COLLECTION, "vind");
   SearchConfiguration.set(SearchConfiguration.SERVER_SOLR_CLOUD, false);
   return SearchServer.getInstance();
}
private SearchServer getElasticServer() {
   SearchConfiguration.set(SearchConfiguration.SERVER_PROVIDER, "com.rbmhtechnology.vind.elasticsearch.backend.ElasticServerProvider");
   SearchConfiguration.set(SearchConfiguration.SERVER_HOST, "http://localhost:9200");
   SearchConfiguration.set(SearchConfiguration.SERVER_COLLECTION_AUTOCREATE, true);
   SearchConfiguration.set(SearchConfiguration.SERVER_COLLECTION, "vind");
   return SearchServer.getInstance();
}
private SearchServer getParallelIndexingServer() {
   return new MasterSlaveSearchServer(getSolrServer(), getElasticServer());
}

Creation of a MasterSlaveServer with a RemoteSolrServer as master and an ElasticServer as Slave.

Index Automatic creation

Also, a new feature included for elasticsearch backend, is the possibility to configure the search server to create the needed index if it is not existing on the Elasticsearch cluster. 

SearchConfiguration.set(SearchConfiguration.SERVER_COLLECTION_AUTOCREATE, true);

Configuration set up to create the index on start up if non existent.

Next steps

As already mentioned, Elasticsearch backend does not implement yet all the features existing for Solr backend, so further steps will firstly be focused on reaching the same reliability and capabilities accomplished by the Solr implementation (release v3.0.0 ) including the following:

  • Complex fields
  • Scoped operationsNested documents (Note that this implementation may be removed from Solr backend or re-worked with a completely different approach)

Also in the road map, parallel to the elasticsearch work, the improvement of already existing functionalities like:

  • Suggestion: rankings and response times
  • Search: identify search patterns on input text