In previous posts, all listed below, we’ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let’s get into it and see what Apache Solr and ElasticSearch have to offer.
- Solr vs. ElasticSearch: Part 1 – Overview
- Solr vs. ElasticSearch: Part 2 – Data Handling
- Solr vs. ElasticSearch: Part 3 – Searching
- Solr vs. ElasticSearch: Part 4 – Faceting
- Solr vs. ElasticSearch: Part 5 – Management API Capabilities
- Solr vs. ElasticSearch: Part 6 – User & Dev Communities Compared
Input/Output Format
ElasticSearch
As you probably know ElasticSearch offers a single way to talk to it – its HTTP REST API – JSON structured queries and responses. In most cases, especially during query time, it is very handy, because it let’s you perfectly control the structure of your queries and thus control the logic.
Apache Solr
On the other hand we have Apache Solr. If you are familiar with it you know that in order to send a query to Solr one needs to send it using URL request parameters. This makes communication much less structured compared to ElasticSearch JSON format. In response you can get multiple response formats that are supported out of the box, like the default XML, JSON, CSV, PHP serialized, or Ruby.
Statistics API
Most of the time your search cluster will be fine and you won’t have any problems with it. However, there are times where you may need to see what is happening inside Apache Solr or ElasticSearch to diagnose problems, such as performance problems (hello SPM!), stability issues, or anything like that. In such cases, both search engines provide some amount of statistics.
Apache Solr
In Solr we can use JMX or HTTP requests to retrieve information about handler usage, cache statistics or information about most Solr components.
ElasticSearch
ElasticSearch was designed to be able to return various statistics about itself. With the REST API calls we can get information from the simplest ones like cluster health or nodes statistic, to extent information like the detailed ones about indices with merges, refreshes. The same stats are available via JMX, too.
Settings API
ElasticSearch
ElasticSearch allows us to modify most of the configuration values dynamically. For example, you can clear you caches (or just specific type of cache), you can move shards and replicas to specific nodes in your cluster. In addition to that you are also allowed to update mappings (to some extent), define warming queries (since version 0.20), etc. You can even shut down a single node or a whole cluster with the use of a single HTTP call. Of course, this is just an example and doesn’t cover all the possibilities exposed by ElasticSearch.
Apache Solr
In case of Apache Solr we do not (yet) have the possibility of changing configuration values (like warming queries) with API calls.
Index / Collection Administration Capabilities
In addition to the capabilities mentioned above both ElasticSearch and Apache Solr provide APIs that allows us to modify our deployment when it comes to collections and indices.
Apache Solr
Pre 4.0 we were able to manipulate cores inside our Solr instances. We could create new cores, reload them, get their status, rename, swap two of them, and finally remove a core from the instance. With Solr 4.0, a new API was introduced that is built on top of core admin API – the collections API. It allows us to create collections on started SolrCloud cluster, reload them and of course delete them. As the collections API is built on top of the core admin API, if you create a new collection all the needed cores on all instances will be created. Of course, the same goes for reloading and deleting – all the cores will be appropriately informed and processed.
ElasticSearch
In case of ElasticSearch we can create and delete indices by running a simple HTTP command (GET or DELETE method) with the index name we are interested in. In addition to that, with a simple API call we can increase and decrease the number of replicas without the need of shutting down nodes or creating new nodes. With the newer ElasticSearch versions we can even manipulate shard placement with the cluster reroute API. With the use of that API we can move shards between nodes, we can cancel shard allocation process and we can also force shard allocation – everything on a live cluster.
Query Analysis
Apache Solr
If you’ve used Apache Solr you probably come across the debugQuery parameter and the explainOther parameter. Those two allows to see the detailed score calculation for the given query and documents found in the results (the debugQuery parameter) and the specified ones (the explainOther). In addition, we can also see how the analysis process is done with the use of analysis handler or by using the analysis page of the Solr administration panel provided with Solr.
For example this is how debug information returned by Solr can look like:
<?xml version="1.0" encoding="UTF-8"?> <response> . . . <lst name="debug"> <str name="rawquerystring">ten</str> <str name="querystring">ten</str> <str name="parsedquery">(+DisjunctionMaxQuery((prefixTok:ten)~0.01) ())/no_coord</str> <str name="parsedquery_toString">+(prefixTok:ten)~0.01 ()</str> <str name="QParser">DisMaxQParser</str> <null name="altquerystring"/> <null name="boostfuncs"/> <lst name="timing"> <double name="time">2.0</double> <lst name="prepare"> <double name="time">1.0</double> <lst name="org.apache.solr.handler.component.QueryComponent"> <double name="time">1.0</double> </lst> <lst name="org.apache.solr.handler.component.FacetComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.HighlightComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.StatsComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.DebugComponent"> <double name="time">0.0</double> </lst> </lst> <lst name="process"> <double name="time">1.0</double> <lst name="org.apache.solr.handler.component.QueryComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.FacetComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.HighlightComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.StatsComponent"> <double name="time">0.0</double> </lst> <lst name="org.apache.solr.handler.component.DebugComponent"> <double name="time">1.0</double> </lst> </lst> </lst> <lst name="explain"> <str name="Ten mices"> 1.3527006 = (MATCH) sum of: 1.3527006 = (MATCH) weight(prefixTok:ten in 35158) [DefaultSimilarity], result of: 1.3527006 = fieldWeight in 35158, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 6.1216245 = idf(docFreq=6355, maxDocs=1065313) 0.15625 = fieldNorm(doc=35158) </str> </lst> </lst> </response>
As you can see, we can get information about timings of each of the used components. In addition to that, we see the parsed query and of course the explain information showing us how the document score was calculated.
ElasticSearch
ElasticSearch exposes three separate REST end-points to analyze our queries, documents and explain the documents score. The Analyze API allows us to test our analyzer on a specified text in order to see how it is processed and is similar to the analysis page functionality of Solr. The Explain API provides us with information about the score calculation for a given documents. Finally, the Validate API can validate our query to see is it is proper and how expensive it can be.
For example, this is what Explain API response looks like:
{ "ok" : true, "_index" : "docs", "_type" : "doc", "_id" : "1", "matched" : true, "explanation" : { "value" : 0.15342641, "description" : "fieldWeight(_all:document in 0), product of:", "details" : [ { "value" : 1.0, "description" : "tf(termFreq(_all:document)=1)" }, { "value" : 0.30685282, "description" : "idf(docFreq=1, maxDocs=1)" }, { "value" : 0.5, "description" : "fieldNorm(field=_all, doc=0)" } ] } }
You can see the description about score calculation that is returned from the Explain API.
Before We End
There are a few words more we wanted to write before summarizing this comparison. First of all the above mentioned APIs and possibilities are not all that it is available, especially when it comes to ElasticSearch. For example, with ElasticSearch you can clear caches on the index level, you can check index and types existence, you can retrieve and manage your warming queries, clear the transaction log by running the Flush API, or even close an index or open those that were closed. We wanted to point some differences and similarities between Apache Solr and ElasticSearch, but we didn’t want to make a summary of the documentation. So, if you are interested in some functionality and you don’t know if it exists, just send a mail to Apache Solr or ElasticSearch mailing list or leave a comment here, and we will be glad to help.
Summary
When we first started the Solr vs ElasticSearch series we planned to initially divide the series into five posts, which are now published. However after seeing the popularity of the series and the amount of feedback we’ve received, we decided to extend the series. You can soon expect the next part, which will be dedicated to non-technical, but deeply important and interesting aspects of both search servers. After that, we’ll get back to the technical details with the subsequent post dedicated to score influence capabilities, describing how we can change the default Lucene scoring and influence it from configuration, during indexing time and finally during querying.
If you liked this post, please tweet it!
