One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements. As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven’t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:
- Solr vs. ElasticSearch: Part 1 – Overview
- Solr vs. ElasticSearch: Part 2 – Data Handling
- Solr vs. ElasticSearch: Part 3 – Searching
- Solr vs. ElasticSearch: Part 4 – Faceting
- Solr vs. ElasticSearch: Part 5 – Management API Capabilities
- Solr vs. ElasticSearch: Part 6 – User & Dev Communities Compared
Users Community
Let’s start by discussing the user activity around both ElasticSearch and Apache Solr.
Users Activity
We started working on this post right before the Christmas break of 2012. During that time we decided to see how active the user base was for both ElasticSearch and Apache Solr. To do that we used our handy search-lucene.com service and we compared the number of email messages sent to both user list. So let’s see how they stack up.
Apache Solr
Image may be NSFW.
Clik here to view.
As you can see, Solr user activity varies slightly from month to month which is perfectly understandable. Each bar on the chart represents two weeks. We can see the number of messages ranges from about 390 mails to about 770 per two weeks, which gives us between 800 to 1600 mails per month is we do a bit of rounding up. Quite impressive I must say!
ElasticSearch
Image may be NSFW.
Clik here to view.
Now let’s discuss the ElasticSearch side. First a few words of explanation. If you look at the above chart you might think that ElasticSearch mailing list was silent and then users started posting on October 2012. That’s clearly not true – it is just that we didn’t add ElasticSearch to search-lucene.com until recently. However, you may see that the number of messages during the same period of time is quite similar – both Solr and ElasticSearch saw about 670 – 730 messages during a two weeks period. This gives us 2 emails per hour on average.
Distinct Users
Email volume is one thing, but I was always curious about how many different people write emails on the mailing lists. Having such number would give us an additional understanding of the structure of the community around a particular search engine, new users, etc. However, we should not look only at this number, but also on things like most active people on the mailing lists. In both cases we’ve looked at the same period from 1 to 30 December 2012. We’ve used the data we index for search-lucene.com to calculate these numbers.
Apache Solr
In case of Apache Solr there were 234 unique users sending mail to the users mailing list. Almost 8 unique users per day on average, nice Image may be NSFW.
Clik here to view.
ElasticSearch
In case of ElasticSearch there were 271 unique users sending mail to the users mailing list. This gives us about 9 unique users per day on average which is even nicer.
Resources Available
As far as resources available, both ElasticSearch and Solr have great documentation. On Solr wiki site (http://wiki.apache.org/solr/) you can find information about most of the components and of course the tutorial for beginners. ElasticSearch is very similar, with tutorial and very good description of functionality available at http://www.elasticsearch.org/. In addition to that, there are three books published about Apache Solr (in English) and more (e.g. my Apache Solr 4 Cookbook) coming soon. As of now, there are no published books about ElasticSearch, but…. stay tuned Image may be NSFW.
Clik here to view.
Search Trending
We also decided to use uncle Google to look at trends about Apache Solr and ElasticSearch. Let’s look at the following diagram:
Image may be NSFW.
Clik here to view.
As you can see, until early 2010 there was no interest in ElasticSearch at all, at least looking from the point of view of users searching about it. Note that we published the interview with Shay Banon over two and a half years ago – back in May 2010 – before ElasticSearch registered on Google’s search trends radar! SolrCloud didn’t exist back then, so people slowly started looking for information on SolrCloud later in 2010. The volume of searches mentioning SolrCloud is very small even today – perhaps because people tend to search for Solr and not SolrCloud. And while SolrCloud is still a new kid around the block, searches for Solr dwarf searches for ElasticSearch despite the buzz surrounding ElasticSearch.
Of course, the above doesn’t say anything about the number of users of both search engines, but it definitely shows some information about the interest in these technologies.
Developers and the Code
If you are familiar with ElasticSearch and Solr you’ll probably know that ElasticSearch is much younger than Apache Solr. Apache Solr was created by Yonik Seeley in 2004 and donated to Apache Software Foundation. On the other hand, the first version of ElasticSearch was released by Shay Banon in 2010. This is quite important to say before we can talk about differences about contributors and the code itself. But getting to the point – we thought that it may be interesting to see both Apache Solr and ElasticSearch look from the Bird’s Eye perspective. To do that we’ve used the statistics and charts from ohloh.net. So, let’s see what they look like.
Apache Solr
Code Statistics
If we look at the current statistics, at the beginning of January 2013 Solr had more than 212k lines of code, with almost 7000 commits and 38 contributors. However, keep in mind that contributors are people that committed the code, not necessarily the ones that actually implemented it and provided the patch, so the actual number of contributors is much higher. The chart looks like this: Image may be NSFW.
Clik here to view.
Top Contributors
If we look at top contributors we see Mark Miller on top, followed by Yonik Seeley and Robert Muir in the third place Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.
Active Contributors
One more interesting thing is the number of contributors that were actively involved during a given period of time. Looking at Apache Solr since 2006 we can see the following: Image may be NSFW.
Clik here to view. I think that we can say that we had a stable growth of active contributors starting from 2006 until June 2012 with a bit of downfall shortly after that. However I don’t think that the number active contributors will be dropping, it’s more likely due to a bit of exhaustion of releasing Apache Lucene and Solr 4.0 Image may be NSFW.
Clik here to view.
ElasticSearch
Code Statistics
Current code statistics for ElasticSeach shows that the code base just hit the 240k LOC with about 4.2k commits and 87 contributors. Image may be NSFW.
Clik here to view.
Top Contributors
As we’d expect, Shay Banon is the top contributor to ElasticSearch. In the second place on the podium we have Martijn van Groningen and Igor Motov in the third place: Image may be NSFW.
Clik here to view.
Active Contributors
And finally the active contributors. We don’t have the same time frame comparing to Apache Solr, which is understandable as ElasticSearch is younger, but still we can see what is happening. Image may be NSFW.
Clik here to view. As you can see from the first quarter of 2011 there was a number of active contributors varying from 5 to about 10 with the top at the same time as in Solr – 12 active contributors in June 2012.
Summary
As everything in this post indicates, both projects’ development and user communities are strong, active, and about equal. 2013 will be an interesting year for both projects.
We are nearing the end of our SolrCloud vs. ElasticSearch series. What else would you like us to cover? Please use the comments to let us know!
Image may be NSFW.
Clik here to view.
Clik here to view.
