incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Sun, 08 Dec 2013 14:26:05 GMT
Hi James,

Thanks for your interest and questions, I will attempt to answer your
questions below.

On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <> wrote:

> Hi Aaron, I'm wondering if you can talk a little about how you Blur
> differentiating itself from ElasticSearch and Solr. It seems like both of
> them, in particular Solr after picking up some Blur code, are gaining more
> abilities to interact with hadoop and HDFS.

Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell you that
Blur's high level features when talking about how it's interacts with

- Index storage (The obvious one)
- Bulk offline indexing, with incremental updates.
This one gives you the ability to perform indexing on a dedicated MapReduce
cluster and simply move the index updates to the running Blur cluster for
- WAL (write ahead log) is written to use HDFS
- Also we are currently moving most of the meta data from ZooKeeper storage
to HDFS storage.  This makes interacting with the meta data of a table easy
to do form within MapReduce jobs

> How does a blur install differ from a solr setup reading off hdfs?

Again I'm not an expert in Solr.  Blur's setup runs a cluster of shard
servers that serve shards (indexes) of the table within that shard cluster.
 The indexes are stored once in HDFS (not counting the HDFS replication
here) and evenly distributed across whatever shard servers are online.
 Blur utilizes a BlockCache (think file system cache) that is an off-heap
based system.  The first version of this was originally picked up by
Cloudera and modified (I'm assuming) and committed back into the
Lucene/Solr code base.  The second version of this block cache (Blur 0.2.2
stable) is now the default in Blur.  It has several advantages of the first

One interesting feature of Blur is the ability to run a cluster of
controllers (controllers are used to make the shard cluster look like a
single service) in front multiple shard clusters.  This can help to deal
with reindexes of data, meaning that you can reindex all your index to a
new cluster and not effect performance of the cluster that your users may
be interacting with.

Some of the overall features of Blur are:
- NRT updates of data
- Offline bulk indexing
- Block cache for fast query performance
- Index warmup (pulls parts of the index up into block cache when a segment
is brought online)
- Performance metrics gathering
- Distributed tracing
- Custom index types
- Custom server side logic can be implemented (basic)

I'm sure there are many more.

Hope this helps.


> thanks
> James

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message