incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Blur capability question
Date Mon, 01 Dec 2014 14:10:09 GMT
On Sun, Nov 30, 2014 at 3:53 PM, Mark Kerzner <mark@elephantscale.com>
wrote:

> Hi,
>
> Latest Lucene 4.0 (and Solr) has the feature of near-real-time search:
> index is updated in memory and is available for searches, but not committed
> to the hard drive, with all the accompanying features.
>
> Blur has the same, I believe, but I am guessing that it has implemented it
> directly, without the latest Lucene in-memory features. Why do I think so?
> Because Blur had this seemingly before Lucene 4.0.
>
> Could you please either give me the answer, or tell me where in the code to
> look?
>

Yes Blur has a NRT like capability though it is not implemented with the
Lucene NRT classes.  Currently there are 3 different ways that Blur accepts
data mutates.

1. Thrift API mutate call.  This call is blocking and commits and refreshes
the index during the call.  This is also an atomic call.
http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutate
A variant of the call is mutate batch which just batches the calls to each
shard server.  However this is not an atomic call.  Meaning that in the
event of a mutate failure in one shard the entire batch will not fail.
http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutateBatch

2. Thrift API enqueue mutate call.  This call is similar to the Lucene NRT
updates in that it will indexing for 5 seconds (configurable) and then
commit and refresh.  Something to note about this method that is different
than the default Lucene implementation is that Blur will not return results
to the user that are not committed to the index.  The way this call is
implemented is by placing an in-memory queue in front of the indexing
process.  Currently the queue is not backed to disk, but it is something we
want to add.
http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_enqueueMutate

3. The last method is not NRT but is worth mentioning.  MapReduce batch
processing can produce a bulk incremental load for Blur.

All of the index changes are performed per shard through a single internal
API.

https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/IndexAction.java

And the writer that handles all mutates.

https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/BlurIndexSimpleWriter.java

There will also be a 4th method for index mutations soon.  We will be
implementing a write API in our new command platform.  In concept they are
similar to stored procedures which allow developers to embed their own
methods, indexing and query models into Blur.

Does this answer your question?

Aaron


> Thank you.
>
> Sincerely,
> Mark
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message