lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From parnab kumar <parnab.2...@gmail.com>
Subject Re: Lucene newbie in need of a hint
Date Thu, 14 Aug 2014 22:41:12 GMT
Have a look at this article if you have not already gone through it.
http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html


On Thu, Aug 14, 2014 at 11:16 PM, Michael Jennings <
mike.c.jennings@gmail.com> wrote:

> Hi everyone,
>
> I'm a bit of a Lucene newb, but a fairly experienced Java developer. Hope
> someone can give me some clues as to what I may be doing wrong.
>
> In essence I've got a lucene index built off of a database table that gets
> updated at a rate of about 1 row changing every 2 seconds or so. I've got a
> webapp whose sole purpose in life is to provide a simple front end for
> searching this table.
>
> The table in question lives in an Oracle db (not that Java cares) and it
> has 2 datetime/timestamp columns; ent_dtm and upd_dtm. When a new row gets
> inserted into the table, a trigger sets the ent_dtm to be "right now". When
> a row gets updated, a trigger sets the upd_dtm to be "right now".
>
> queries like: SELECT COL1, COL2,... COLn from THE_TABLE where ENT_DTM >
> (some timestamp) are very fast, as are queries like:
>
> SELECT COL1, COL2,... COLn from THE_TABLE where UPD_DTM > (some timestamp)
>
> These are the sorts of queries I use to keep my lucene index "in synch"
> with the table and these queries are fast and there are no issues with
> them.
>
> As you would expect, each Document in my lucene index roughly corresponds
> to a row in THE_TABLE, including 2 fields called "ent_dtm" and "upd_dtm"
>
> THE_TABLE has a primary key which I will call THE_ID. Correspondingly, a
> Document in the Lucene index has a field called "the_id"
>
> values of "the_id" are typically numbers (Field.Store.YES,
> Field.Index.NOT_ANALYZED_NO_NORMS) with the exception of a "special" value
> of "newest". The Document with the field "the_id" with the value of
> "newest" contains just 2 more fields, ent_dtm and upd_dtm.
>
> This Document is just used to keep track of "what's the newest thing in
> Lucene's world"
>
> So this is what my webapp is doing:
>
> In a background thread, every 1.2 seconds it checks the Lucene index for
> "what's the newest thing in my world" (call that X) uses that to hit the
> database asking it in essence "have you got anything newer in your world
> than X", if it returns say 3 rows newer than X, call the newest of those
> rows Y.
>
> Then, this background thread updates the Document with the_id="newest" with
> Y then goes to sleep again for 1.2 seconds. Lather, rinse, repeat.
>
> Incoming search requests attempt to use a "Near Real Time" IndexReader
> (with an IndexSearcher wrapped around it) to search the index.
>
> Again, everything seems to do what it says on the box.
>
> My problem is that I can't seem to avoid the occasional 100 second pause
> while IndexReader "refreshes itself".
>
> I create my one-and-only shared IndexReader thusly:
>
> indexReader = IndexReader.open(indexWriter, true);
>
> and I check if it needs to be refreshed by calling indexReader.isCurrent()
>
> and I "refresh" it with the following method:
>
>   public static IndexReader freshVersionOf(IndexReader indexReader) throws
> IOException {
>     StopWatch stopWatch = new StopWatch();
>     final IndexReader newReader = IndexReader.openIfChanged(indexReader,
> true);
>     logger.info("IndexReader.openIfChanged() took " +
> stopWatch.elapsedSeconds() + " seconds");
>     if (newReader == null) {
>       return indexReader;
>     } else {
>       indexReader.close();
>       return newReader;
>     }
>   }
>
> Which is basically a Lucene method moved into a static method in my own
> code (my method closes the old indexReader, that's the only difference)
>
>
> Sometimes IndexReader.openIfChanged(indexReader, true); takes what seems
> like a crapload of time. If I don't "freshen" the IndexReader, it doesn't
> see the latest-and-greatest timestamp (ie. what is newest in the Lucene
> world). I've tried doing indexWriter.commit() in my background thread, but
> that can take on the order of 100 seconds as well.
>
> Anyway, all the searching and updating of the index is all working just
> fine, it's just that I'm seeing these occasional long periods of time which
> seem to be unavoidable.
>
> Any suggestions of things to try would be appreciated!
>
> PS. I'm using Lucene 3.6 which it seems lots of people have used
> successfully in the past, so I'm guessing the "use the newer Lucene" won't
> necessarily help me.
>
>
> --
> Mike Jennings
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message