incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <>
Subject [jira] [Updated] (BLUR-61) Remove sessions from the code
Date Mon, 07 Oct 2013 14:07:44 GMT


Aaron McCurry updated BLUR-61:

    Summary: Remove sessions from the code  (was: Remove sessions from the 0.2 code)

> Remove sessions from the code
> -----------------------------
>                 Key: BLUR-61
>                 URL:
>             Project: Apache Blur
>          Issue Type: Bug
>    Affects Versions: experimental-dev
>            Reporter: Aaron McCurry
> There was a discussion on the mail list about the maintaining of sessions in the 0.2
> I would like to remove the need for sessions from the code.  I prepose that we accomplish
this by including the segment in the documentation location throughout the API.
> Background, this is really an issue with Lucene and how it deals with mutations on the
index.  Let me provide an example:
> 1. Document A gets added to the index and let's say that it gets added into the Lucene
segment of "aa" which through a bit of math it becomes document id 3570586 in the overall
index but it actually document id 304 in the "aa" segment.  
> 2. Search gets executed, an index snapshot is created and Document A was reported in
the search results as a hit at 3570586.
> 3. Now say that the document id reported to another system, and later that system actually
wants to fetch the data for the hit.
> 4. Now a merge occurs and the "aa" is now merged with another segment (one or more).
> 5. Then the other system wants to fetch the document 3570586.  A new snapshot of the
index was created and then document id 3570586 was requested.  But it's very likely (only
blind luck will it be the right document) that it's going to fetch the wrong document.
> Currently in the blur 0.2 code we get around this problem by storing the index snapshot
in a session on each server.  So during a session the index cannot change.
> Back to my preposed change of adding the segment to the document location.  The new document
location will include [ shard index / segment name / document id in the segment (not the overall
index document id) ].  On the server side keep old segments around for a certain amount of
time after their last access, basically a LRU cache.  That way if a segment is deleted and
another system still asks for data from an old segment, the data can still be retrieved.

This message was sent by Atlassian JIRA

View raw message