lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Liu" <andyliu1...@gmail.com>
Subject Re: Using ParallelReader over large immutable index and small updatable index
Date Wed, 07 Mar 2007 16:17:58 GMT
>From my understanding, MultiSearcher is used to combine two indexes that
have the same fields but different documents.  ParallelReader is used to
combine two indexes that have same documents but different fields.  I'm
trying to do the latter.  Is my understanding correct?  For example, what
I'm trying to do is have one immutable index that has these fields:

field1
field2
field3

and my "update" index that has one field

field4

Both indexes have the same documents, and the docId's are synchronized.
This allows me to execute searches like:

+field1:foo +field4:bar

field4 is a field that would be updated frequently and as real-time as
possible.  However, once I update field4, the docId's are no longer
synchronized, and ParallelReader fails.

Andy

On 3/6/07, Alexey Lef <alexey@sciquest.com> wrote:
>
> We use MultiSearcher for a similar scenario. This way you can keep the
> Searcher/Reader for the read-only index alive and refresh the small index
> Searcher whenever an update is made. If you have any cached filters, they
> are mapped to a Reader, so the cached filters for the big index will stay
> alive as well. The only (small) problem I have found so far is how
> MultiSearcher handles custom Similarity (see
> https://issues.apache.org/jira/browse/LUCENE-789).
>
> Hope this helps,
>
> Alexey
>
> -----Original Message-----
> From: Andy Liu [mailto:andyliu1227@gmail.com]
> Sent: Tuesday, March 06, 2007 3:34 PM
> To: java-user@lucene.apache.org
> Subject: Using ParallelReader over large immutable index and small
> updatable index
>
> Is there a working solution out there that would let me use ParallelReader
> to search over a large, immutable index and a smaller, auxillary index
> that
> is updated frequently?  Currently, from my understanding, the
> ParallelReader
> fails when one of the indexes is updated because the document ID's get out
> of synch.  Using ParallelReader in this way is attractive for me because
> it
> would allow me to quickly make updates to only the fields that change.
>
> The alternative is to use one index.  However, an update would require me
> to
> delete the entire document (which is quite large in my application) and
> reinsert it after making updates.  This requires a lot more I/O and is a
> lot
> slower, and I'd like to avoid this if possible.
>
> I can think of other alternatives, but all involve storing data and/or
> bitsets in memory, which is not very scalable.  I need to be able to
> handle
> millions of documents.
>
> I'm also open to any solution that don't involve ParallelReader that would
> help me make quick updates in the most non-disruptive and scalable
> fashion.
> But it just seems that ParallelReader would be perfect for me needs, if I
> can get past this issue.
>
> I've seen posts about this issue on the list, but nothing pointing to a
> solution.  Can somebody help me out?
>
> Andy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message