incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Contrast of Blur to ElasticSearch, Solr
Date Sun, 08 Dec 2013 14:57:07 GMT
Thanks Aaron for this info.  This sounds very similar to both Solr/ES.....
from this description I can't really see any significant difference.
 Perhaps the main difference is that with Solr/ES Hadoop/HDFS/MapReduce is
something that's optional and that most people do not (need to) use, while
Hadoop/HDFS/MapReduce are an integral part of Blur's offering and you can't
have Blur without them.

What is distributed tracing?  I can't map that to anything in Solr/ES.

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support *

On Sun, Dec 8, 2013 at 9:26 AM, Aaron McCurry <> wrote:

> Hi James,
> Thanks for your interest and questions, I will attempt to answer your
> questions below.
> On Sat, Dec 7, 2013 at 8:47 AM, James Kebinger <>
> wrote:
> > Hi Aaron, I'm wondering if you can talk a little about how you Blur
> > differentiating itself from ElasticSearch and Solr. It seems like both of
> > them, in particular Solr after picking up some Blur code, are gaining
> more
> > abilities to interact with hadoop and HDFS.
> >
> Unfortunately I'm not an expert in Solr or ElasticSearch.  I tell you that
> Blur's high level features when talking about how it's interacts with
> Hadoop.
> - Index storage (The obvious one)
> - Bulk offline indexing, with incremental updates.
> This one gives you the ability to perform indexing on a dedicated MapReduce
> cluster and simply move the index updates to the running Blur cluster for
> importing.
> - WAL (write ahead log) is written to use HDFS
> - Also we are currently moving most of the meta data from ZooKeeper storage
> to HDFS storage.  This makes interacting with the meta data of a table easy
> to do form within MapReduce jobs
> > How does a blur install differ from a solr setup reading off hdfs?
> >
> Again I'm not an expert in Solr.  Blur's setup runs a cluster of shard
> servers that serve shards (indexes) of the table within that shard cluster.
>  The indexes are stored once in HDFS (not counting the HDFS replication
> here) and evenly distributed across whatever shard servers are online.
>  Blur utilizes a BlockCache (think file system cache) that is an off-heap
> based system.  The first version of this was originally picked up by
> Cloudera and modified (I'm assuming) and committed back into the
> Lucene/Solr code base.  The second version of this block cache (Blur 0.2.2
> stable) is now the default in Blur.  It has several advantages of the first
> version:
> One interesting feature of Blur is the ability to run a cluster of
> controllers (controllers are used to make the shard cluster look like a
> single service) in front multiple shard clusters.  This can help to deal
> with reindexes of data, meaning that you can reindex all your index to a
> new cluster and not effect performance of the cluster that your users may
> be interacting with.
> Some of the overall features of Blur are:
> - NRT updates of data
> - Offline bulk indexing
> - Block cache for fast query performance
> - Index warmup (pulls parts of the index up into block cache when a segment
> is brought online)
> - Performance metrics gathering
> - Distributed tracing
> - Custom index types
> - Custom server side logic can be implemented (basic)
> I'm sure there are many more.
> Hope this helps.
> Aaron
> >
> > thanks
> >
> > James
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message