incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BLUR-61) Remove sessions from the 0.2 code
Date Thu, 21 Feb 2013 03:20:12 GMT
Aaron McCurry created BLUR-61:
---------------------------------

             Summary: Remove sessions from the 0.2 code
                 Key: BLUR-61
                 URL: https://issues.apache.org/jira/browse/BLUR-61
             Project: Apache Blur
          Issue Type: Bug
            Reporter: Aaron McCurry


There was a discussion on the mail list about the maintaining of sessions in the 0.2 code.

http://mail-archives.apache.org/mod_mbox/incubator-blur-dev/201302.mbox/%3CCAG_bHoy3_vDTV1JMfBScU-7Mob4i9pm6dLoF5Di6oUsgMpJgMg@mail.gmail.com%3E

I would like to remove the need for sessions from the code.  I prepose that we accomplish
this by including the segment in the documentation location throughout the API.

Background, this is really an issue with Lucene and how it deals with mutations on the index.
 Let me provide an example:

1. Document A gets added to the index and let's say that it gets added into the Lucene segment
of "aa" which through a bit of math it becomes document id 3570586 in the overall index but
it actually document id 304 in the "aa" segment.  

2. Search gets executed, an index snapshot is created and Document A was reported in the search
results as a hit at 3570586.

3. Now say that the document id reported to another system, and later that system actually
wants to fetch the data for the hit.

4. Now a merge occurs and the "aa" is now merged with another segment (one or more).

5. Then the other system wants to fetch the document 3570586.  A new snapshot of the index
was created and then document id 3570586 was requested.  But it's very likely (only blind
luck will it be the right document) that it's going to fetch the wrong document.

Currently in the blur 0.2 code we get around this problem by storing the index snapshot in
a session on each server.  So during a session the index cannot change.

Back to my preposed change of adding the segment to the document location.  The new document
location will include [ shard index / segment name / document id in the segment (not the overall
index document id) ].  On the server side keep old segments around for a certain amount of
time after their last access, basically a LRU cache.  That way if a segment is deleted and
another system still asks for data from an old segment, the data can still be retrieved.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message