incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <>
Subject Re: Observations from Coudera Search
Date Thu, 25 Jul 2013 21:05:24 GMT
On Thu, Jul 25, 2013 at 12:11 AM, rahul challapalli <> wrote:

> Hi,
> I attended a talk from Cloudera about their search solution. One thing
> which was striking was their NRT indexing. They have multiple integration
> points (Flume, HBase) which enables them to index the data as and when it
> is written to HDFS apart from MapReduce based adhoc-batch indexing. One
> thing which was not clear was how(if any) they store metadata(analogous to
> out TableDescriptor) about Indexes.

Blur NRT capabilities are very similar in performance to that of Cloudera
Search, since they share a lot of the same code.  I really should put
together some benchmarks to see what Blur's NRT performance is for 0.2.

Flume, Pig, and Hive are my top technologies to integrate with Blur.  Flume
will likely be the only one that uses the Thrift mutate capabilities, while
the other 2 will likely interact with HDFS directly via Input/OutputFormats.

I'm not a Solr expert but I believe that before 4.4 the index description
was defined in xml config files.  I'm not sure how 4.4+ works.

> Also upon just starting a conversation later, I was told that they
> collaborated with the Blur Team which I was not aware of.

Patrick's email is a good summary of the interaction.


> - Rahul

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message