incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kerzner <m...@elephantscale.com>
Subject Re: Blur capability question
Date Tue, 02 Dec 2014 01:23:16 GMT
Aaron,

I am really grateful for such a complete answer. As an aside, is there a
book or a document where this kind of reference is collected? Surely I will
have my own notes.

For my future purpose - to give the user the latest updates that have made
it into the index yet - it seems that way 2. is the closest. Do I
understand correctly that Blur will keep indexing for 5 seconds
(configurable), while the user, who searches against the index, will not
see the new results? However, there is a queue in front of the index that
one can query separately?

Again, thank you.

Best regards,
Mark

On Mon, Dec 1, 2014 at 8:10 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Sun, Nov 30, 2014 at 3:53 PM, Mark Kerzner <mark@elephantscale.com>
> wrote:
>
> > Hi,
> >
> > Latest Lucene 4.0 (and Solr) has the feature of near-real-time search:
> > index is updated in memory and is available for searches, but not
> committed
> > to the hard drive, with all the accompanying features.
> >
> > Blur has the same, I believe, but I am guessing that it has implemented
> it
> > directly, without the latest Lucene in-memory features. Why do I think
> so?
> > Because Blur had this seemingly before Lucene 4.0.
> >
> > Could you please either give me the answer, or tell me where in the code
> to
> > look?
> >
>
> Yes Blur has a NRT like capability though it is not implemented with the
> Lucene NRT classes.  Currently there are 3 different ways that Blur accepts
> data mutates.
>
> 1. Thrift API mutate call.  This call is blocking and commits and refreshes
> the index during the call.  This is also an atomic call.
> http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutate
> A variant of the call is mutate batch which just batches the calls to each
> shard server.  However this is not an atomic call.  Meaning that in the
> event of a mutate failure in one shard the entire batch will not fail.
> http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_mutateBatch
>
> 2. Thrift API enqueue mutate call.  This call is similar to the Lucene NRT
> updates in that it will indexing for 5 seconds (configurable) and then
> commit and refresh.  Something to note about this method that is different
> than the default Lucene implementation is that Blur will not return results
> to the user that are not committed to the index.  The way this call is
> implemented is by placing an in-memory queue in front of the indexing
> process.  Currently the queue is not backed to disk, but it is something we
> want to add.
> http://incubator.apache.org/blur/docs/0.2.3/Blur.html#Fn_Blur_enqueueMutate
>
> 3. The last method is not NRT but is worth mentioning.  MapReduce batch
> processing can produce a bulk incremental load for Blur.
>
> All of the index changes are performed per shard through a single internal
> API.
>
>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/IndexAction.java
>
> And the writer that handles all mutates.
>
>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/manager/writer/BlurIndexSimpleWriter.java
>
> There will also be a 4th method for index mutations soon.  We will be
> implementing a write API in our new command platform.  In concept they are
> similar to stored procedures which allow developers to embed their own
> methods, indexing and query models into Blur.
>
> Does this answer your question?
>
> Aaron
>
>
> > Thank you.
> >
> > Sincerely,
> > Mark
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message