lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Realtime search best practices
Date Mon, 12 Oct 2009 21:47:22 GMT
I still see some things we might want to document or explain:

We still need to be careful what the call to "isCurrent()"
will mean in the future for IndexReaders - as now there is another
kind of "current" - "current even up to uncommitted changes".

Imagine the following set of IndexReaders floating around an
application:
------
1)  IndexReader reader = IndexReader.open(diskDir);

// this reader is certainly current.
2)  assert(reader.isCurrent());

3)  IndexWriter writer = new IndexWriter(diskDir);
4)  writer.addDocument(doc);

// this reader has access to that doc
5)  IndexReader nrtReader = writer.getReader();

6)  writer.addDocument(doc2);

// now for the isCurrent() semantics... the disk reader is
// still current, as of last commit:
7)  assert(reader.isCurrent());

// as is the nrtReader, even though it has information
// *past* the most recent commit, but not all of it!
8) assert(nrtReader.isCurrent());

// reopen the nrtReader and get access to doc2
9) nrtReader = writer.getReader();

// now nrtReader is not only current, but "maximally current"
10) assert(nrtReader.isCurrent());

// but what about now?
11)  writer.commit();

// the disk index reader follows the old ways:
12)  assert(!reader.isCurrent());

// but what does the nrtReader say?
// it does not have access to the most recent commit
// state, as there's been a commit (with documents)
// since it was opened.  But the nrtReader *has* those
// documents.

13)  assert(!nrtReader.isCurrent());
-----

The result of lines 8 and 13 especially seem to show how
one could get confused on what is meant by current - but
it maybe is just a naming issue (although line 13 seems
to be more than that: the nrtReader in that case really is
up-to-date with disk at this point, and would show exactly
the results which a freshly opened reader would).

Maybe people should be advised to not mix and match
disk readers and IndexWriter supplied ones, and if they
want NRT search with lucene 2.9+, they grab a reader from
the IndexWriter upon opening said writer, and then just
continually call reopen() on it as queries come in
throughout the life of their application (being careful not
to close() their writer and thus trigger an
AlreadyClosedException)?

  -jake


On Mon, Oct 12, 2009 at 1:56 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I agree, the javadocs could be improved.  How about something like
> this for the first 2 paragraphs:
>
>   * Returns a readonly reader, covering all committed as
>   * well as un-committed changes to the index.  This
>    * provides "near real-time" searching, in that changes
>    * made during an IndexWriter session can be quickly made
>   * available for searching without closing the writer nor
>   * calling {@link #commit}.
>   *
>   * <p>Note that this is functionally equivalent to calling
>   * {#commit} and then using {@link IndexReader#open} to
>   * open a new reader.  But the turarnound time of this
>   * method should be faster since it avoids the potentially
>   * costly {@link #commit}.<p>
>
> Mike
>
> On Mon, Oct 12, 2009 at 4:35 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> > Thanks Yonik,
> >
> >  It may be surprising, but in fact I have read that
> > javadoc.  It talks about not needing to close the
> > writer, but doesn't specifically talk about the what
> > the relationship between commit() calls and
> > getReader() calls is.  I suppose I should have
> > interpreted:
> >
> > "@returns a new reader which contains all
> > changes..."
> >
> > to mean "all uncommitted changes", but why
> > is it so obvious that what could be happening
> > is that it only "returns all changes since the last
> > commit, but without touching disk because it
> > has docs in memory as well"?
> >
> >  -jake
> >
> > On Mon, Oct 12, 2009 at 1:26 PM, Yonik Seeley <
> yonik@lucidimagination.com>wrote:
> >
> >> Guys, please - you're not new at this... this is what JavaDoc is for:
> >>
> >>  /**
> >>   * Returns a readonly reader containing all
> >>   * current updates.  Flush is called automatically.  This
> >>   * provides "near real-time" searching, in that changes
> >>   * made during an IndexWriter session can be made
> >>   * available for searching without closing the writer.
> >>   *
> >>   * <p>It's near real-time because there is no hard
> >>   * guarantee on how quickly you can get a new reader after
> >>   * making changes with IndexWriter.  You'll have to
> >>   * experiment in your situation to determine if it's
> >>   * fast enough.  As this is a new and experimental
> >>   * feature, please report back on your findings so we can
> >>   * learn, improve and iterate.</p>
> >>   *
> >>   * <p>The resulting reader supports {@link
> >>   * IndexReader#reopen}, but that call will simply forward
> >>   * back to this method (though this may change in the
> >>   * future).</p>
> >>   *
> >>   * <p>The very first time this method is called, this
> >>   * writer instance will make every effort to pool the
> >>   * readers that it opens for doing merges, applying
> >>   * deletes, etc.  This means additional resources (RAM,
> >>   * file descriptors, CPU time) will be consumed.</p>
> >>   *
> >>   * <p>For lower latency on reopening a reader, you should
> >>   * call {@link #setMergedSegmentWarmer} to
> >>   * pre-warm a newly merged segment before it's committed
> >>   * to the index.  This is important for minimizing
> >>   * index-to-search delay after a large merge.  </p>
> >>   *
> >>   * <p>If an addIndexes* call is running in another thread,
> >>   * then this reader will only search those segments from
> >>   * the foreign index that have been successfully copied
> >>   * over, so far</p>.
> >>   *
> >>   * <p><b>NOTE</b>: Once the writer is closed, any
> >>   * outstanding readers may continue to be used.  However,
> >>   * if you attempt to reopen any of those readers, you'll
> >>   * hit an {@link AlreadyClosedException}.</p>
> >>   *
> >>   * <p><b>NOTE:</b> This API is experimental and might
> >>   * change in incompatible ways in the next release.</p>
> >>   *
> >>   * @return IndexReader that covers entire index plus all
> >>   * changes made so far by this IndexWriter instance
> >>   *
> >>   * @throws IOException
> >>   */
> >>  public IndexReader getReader() throws IOException {
> >>
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >>
> >> On Mon, Oct 12, 2009 at 4:18 PM, John Wang <john.wang@gmail.com> wrote:
> >> > Oh, that is really good to know!
> >> > Is this deterministic? e.g. as long as writer.addDocument() is called,
> >> next
> >> > getReader reflects the change? Does it work with deletes? e.g.
> >> > writer.deleteDocuments()?
> >> > Thanks Mike for clarifying!
> >> >
> >> > -John
> >> >
> >> > On Mon, Oct 12, 2009 at 12:11 PM, Michael McCandless <
> >> > lucene@mikemccandless.com> wrote:
> >> >
> >> >> Just to clarify: IndexWriter.newReader returns a reader that searches
> >> >> uncommitted changes as well.  Ie, you need not call
> IndexWriter.commit
> >> >> to make the changes visible.
> >> >>
> >> >> However, if you're opening a reader the "normal" way
> >> >> (IndexReader.open) then it is necessary to first call
> >> >> IndexWriter.commit.
> >> >>
> >> >> Mike
> >> >>
> >> >> On Mon, Oct 12, 2009 at 5:24 AM, melix <cedric.champeau@lingway.com>
> >> >> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I'm going to replace an old reader/writer synchronization mechanism
> we
> >> >> had
> >> >> > implemented with the new near realtime search facilities in Lucene
> >> 2.9.
> >> >> > However, it's still a bit unclear on how to efficiently do it.
> >> >> >
> >> >> > Is the following implementation the good way to do achieve it
? The
> >> >> context
> >> >> > is concurrent read/writes on an index :
> >> >> >
> >> >> > 1. create a Directory instance
> >> >> > 2. create a writer on this directory
> >> >> > 3. on each write request, add document to the writer
> >> >> > 4. on each read request,
> >> >> >  a. use writer.getReader() to obtain an up-to-date reader
> >> >> >  b. create an IndexSearcher with that reader
> >> >> >  c. perform Query
> >> >> >  d. close IndexSearcher
> >> >> > 5. on application close
> >> >> >  a. close writer
> >> >> >  b. close directory
> >> >> >
> >> >> > While this seems to be ok, I'm really wondering about the
> performance
> >> of
> >> >> > opening a searcher for each request. I could introduce some kind
of
> >> delay
> >> >> > and cache a searcher for some seconds, but I'm not sure it's the
> best
> >> >> thing
> >> >> > to do.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Cedric
> >> >> >
> >> >> >
> >> >> > --
> >> >> > View this message in context:
> >> >>
> >>
> http://www.nabble.com/Realtime-search-best-practices-tp25852756p25852756.html
> >> >> > Sent from the Lucene - Java Users mailing list archive at
> Nabble.com.
> >> >> >
> >> >> >
> >> >> >
> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >> >
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message