lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene newbie in need of a hint
Date Thu, 14 Aug 2014 22:58:15 GMT
3.6 is quite old by now ... but that behavior (100s pause on reopen)
is strange.  Can you capture all Java threads during that time and
post back?

It looks like you're reopening the reader correctly, though be careful
if you have in-flight searches running in other threads; use
SearcherManager to help for that.

You don't need to IW.commit: that's only for durability (having your
index survive an OS crash, power loss, etc.).

Mike McCandless

http://blog.mikemccandless.com


On Thu, Aug 14, 2014 at 6:16 PM, Michael Jennings
<mike.c.jennings@gmail.com> wrote:
> Hi everyone,
>
> I'm a bit of a Lucene newb, but a fairly experienced Java developer. Hope
> someone can give me some clues as to what I may be doing wrong.
>
> In essence I've got a lucene index built off of a database table that gets
> updated at a rate of about 1 row changing every 2 seconds or so. I've got a
> webapp whose sole purpose in life is to provide a simple front end for
> searching this table.
>
> The table in question lives in an Oracle db (not that Java cares) and it
> has 2 datetime/timestamp columns; ent_dtm and upd_dtm. When a new row gets
> inserted into the table, a trigger sets the ent_dtm to be "right now". When
> a row gets updated, a trigger sets the upd_dtm to be "right now".
>
> queries like: SELECT COL1, COL2,... COLn from THE_TABLE where ENT_DTM >
> (some timestamp) are very fast, as are queries like:
>
> SELECT COL1, COL2,... COLn from THE_TABLE where UPD_DTM > (some timestamp)
>
> These are the sorts of queries I use to keep my lucene index "in synch"
> with the table and these queries are fast and there are no issues with them.
>
> As you would expect, each Document in my lucene index roughly corresponds
> to a row in THE_TABLE, including 2 fields called "ent_dtm" and "upd_dtm"
>
> THE_TABLE has a primary key which I will call THE_ID. Correspondingly, a
> Document in the Lucene index has a field called "the_id"
>
> values of "the_id" are typically numbers (Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS) with the exception of a "special" value
> of "newest". The Document with the field "the_id" with the value of
> "newest" contains just 2 more fields, ent_dtm and upd_dtm.
>
> This Document is just used to keep track of "what's the newest thing in
> Lucene's world"
>
> So this is what my webapp is doing:
>
> In a background thread, every 1.2 seconds it checks the Lucene index for
> "what's the newest thing in my world" (call that X) uses that to hit the
> database asking it in essence "have you got anything newer in your world
> than X", if it returns say 3 rows newer than X, call the newest of those
> rows Y.
>
> Then, this background thread updates the Document with the_id="newest" with
> Y then goes to sleep again for 1.2 seconds. Lather, rinse, repeat.
>
> Incoming search requests attempt to use a "Near Real Time" IndexReader
> (with an IndexSearcher wrapped around it) to search the index.
>
> Again, everything seems to do what it says on the box.
>
> My problem is that I can't seem to avoid the occasional 100 second pause
> while IndexReader "refreshes itself".
>
> I create my one-and-only shared IndexReader thusly:
>
> indexReader = IndexReader.open(indexWriter, true);
>
> and I check if it needs to be refreshed by calling indexReader.isCurrent()
>
> and I "refresh" it with the following method:
>
>   public static IndexReader freshVersionOf(IndexReader indexReader) throws
> IOException {
>     StopWatch stopWatch = new StopWatch();
>     final IndexReader newReader = IndexReader.openIfChanged(indexReader,
> true);
>     logger.info("IndexReader.openIfChanged() took " +
> stopWatch.elapsedSeconds() + " seconds");
>     if (newReader == null) {
>       return indexReader;
>     } else {
>       indexReader.close();
>       return newReader;
>     }
>   }
>
> Which is basically a Lucene method moved into a static method in my own
> code (my method closes the old indexReader, that's the only difference)
>
>
> Sometimes IndexReader.openIfChanged(indexReader, true); takes what seems
> like a crapload of time. If I don't "freshen" the IndexReader, it doesn't
> see the latest-and-greatest timestamp (ie. what is newest in the Lucene
> world). I've tried doing indexWriter.commit() in my background thread, but
> that can take on the order of 100 seconds as well.
>
> Anyway, all the searching and updating of the index is all working just
> fine, it's just that I'm seeing these occasional long periods of time which
> seem to be unavoidable.
>
> Any suggestions of things to try would be appreciated!
>
> PS. I'm using Lucene 3.6 which it seems lots of people have used
> successfully in the past, so I'm guessing the "use the newer Lucene" won't
> necessarily help me.
>
>
> --
> Mike Jennings

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message