incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: sessions
Date Tue, 05 Feb 2013 02:48:05 GMT
I had an idea today on how to remove sessions and still get the same
behavior.

Currently the issue is that as document id's change as the index is
updated/compacted, and for efficiency we only want to fetch the documents
should be returned. e.g.

Search arrives only the first 10 hits are requested.  The query gets
sprayed to all the other servers, and the top 10 document locations are
returned to the issuing server.  The hits are then sorted and merged then
the top 10 of those are returned.  The client then will issue a request for
document data to be retrieved.  However if in-between those calls the
document location that contains the document id changes because of indexes
changes the wrong data could be returned or not found.  The sessions were
introduced to snapshot the indexes during these 2 actions.

Proposal:

Change document location, which currently contains the shard id and the
Lucene document id (referenced from the composition of all the segments in
the index).  To instead have the document location contain the shard id +
segment name + document id in the given segment name.  This will ensure
that the correct document is located between search and fetch.  The only
other thing we have to do is keep the old segments around for a reasonable
amount time (configurable per table maybe default is 30 seconds?).  At the
very least if the segment referenced is not available, the error can be
detected and properly handled in the from of a nice exception.

We should also create a simple single method that searches and fetches
documents in a single call.

Thoughts?

Aaron



On Sun, Feb 3, 2013 at 8:52 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> -It seems to me if we promote long-lived sessions, then the QueryStatusContainer
> lookup map grows unbounded?
>
> Currently yes they would, we will need to place a cap on how long the
> query status can live after it's complete.
>
> -On the other hand, if short sessions are promoted, the BlurServer
> session map grows unbounded?
>
> Current yes if the user doesn't close the session on each of servers.
>  This is bad and incomplete, if close on the session is called on one
> server then it needs to be pushed to all.  Also there really needs to be a
> session manager to cleanup old sessions that are not in use.
>
> -I also don't yet see how the sessions work with multiple controllers?
>
> Yep you are right.  Overall I don't like the session idea.  The only
> reason it exists is because of the natural of Lucene indexes changes.
>
> The problem it attempts to solve is the following.  Search arrives at the
> controller, then it gets sprayed to all the other servers.  Those servers
> respond with the document locations (shard id + lucene doc id) and the
> controller picks the top N number of documents to respond to the client.
>  The client then in turn fetches the correct documents from the servers,
> however if the indexes how changed on those servers the document location
> id may be invalid.  The with session object a user can be insured to have a
> static view of the data and the indexes throughout the lifetime of the
> session.
>
> I really would love to hear about any other thoughts on how to deal with
> this issue.  No matter how radical.
>
> Aaron
>
>
> On Sat, Feb 2, 2013 at 10:03 PM, Tim Williams <williamstw@gmail.com>wrote:
>
>> It seems to me if we promote long-lived sessions, then the
>> QueryStatusContainer lookup map grows unbounded?  On the other hand,
>> if short sessions are promoted, the BlurServer session map grows
>> unbounded? I also don't yet see how the sessions work with multiple
>> controllers?
>>
>> --tim
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message