lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Jennings <mike.c.jenni...@gmail.com>
Subject Lucene newbie in need of a hint
Date Thu, 14 Aug 2014 22:16:57 GMT
Hi everyone,

I'm a bit of a Lucene newb, but a fairly experienced Java developer. Hope
someone can give me some clues as to what I may be doing wrong.

In essence I've got a lucene index built off of a database table that gets
updated at a rate of about 1 row changing every 2 seconds or so. I've got a
webapp whose sole purpose in life is to provide a simple front end for
searching this table.

The table in question lives in an Oracle db (not that Java cares) and it
has 2 datetime/timestamp columns; ent_dtm and upd_dtm. When a new row gets
inserted into the table, a trigger sets the ent_dtm to be "right now". When
a row gets updated, a trigger sets the upd_dtm to be "right now".

queries like: SELECT COL1, COL2,... COLn from THE_TABLE where ENT_DTM >
(some timestamp) are very fast, as are queries like:

SELECT COL1, COL2,... COLn from THE_TABLE where UPD_DTM > (some timestamp)

These are the sorts of queries I use to keep my lucene index "in synch"
with the table and these queries are fast and there are no issues with them.

As you would expect, each Document in my lucene index roughly corresponds
to a row in THE_TABLE, including 2 fields called "ent_dtm" and "upd_dtm"

THE_TABLE has a primary key which I will call THE_ID. Correspondingly, a
Document in the Lucene index has a field called "the_id"

values of "the_id" are typically numbers (Field.Store.YES,
Field.Index.NOT_ANALYZED_NO_NORMS) with the exception of a "special" value
of "newest". The Document with the field "the_id" with the value of
"newest" contains just 2 more fields, ent_dtm and upd_dtm.

This Document is just used to keep track of "what's the newest thing in
Lucene's world"

So this is what my webapp is doing:

In a background thread, every 1.2 seconds it checks the Lucene index for
"what's the newest thing in my world" (call that X) uses that to hit the
database asking it in essence "have you got anything newer in your world
than X", if it returns say 3 rows newer than X, call the newest of those
rows Y.

Then, this background thread updates the Document with the_id="newest" with
Y then goes to sleep again for 1.2 seconds. Lather, rinse, repeat.

Incoming search requests attempt to use a "Near Real Time" IndexReader
(with an IndexSearcher wrapped around it) to search the index.

Again, everything seems to do what it says on the box.

My problem is that I can't seem to avoid the occasional 100 second pause
while IndexReader "refreshes itself".

I create my one-and-only shared IndexReader thusly:

indexReader = IndexReader.open(indexWriter, true);

and I check if it needs to be refreshed by calling indexReader.isCurrent()

and I "refresh" it with the following method:

  public static IndexReader freshVersionOf(IndexReader indexReader) throws
IOException {
    StopWatch stopWatch = new StopWatch();
    final IndexReader newReader = IndexReader.openIfChanged(indexReader,
true);
    logger.info("IndexReader.openIfChanged() took " +
stopWatch.elapsedSeconds() + " seconds");
    if (newReader == null) {
      return indexReader;
    } else {
      indexReader.close();
      return newReader;
    }
  }

Which is basically a Lucene method moved into a static method in my own
code (my method closes the old indexReader, that's the only difference)


Sometimes IndexReader.openIfChanged(indexReader, true); takes what seems
like a crapload of time. If I don't "freshen" the IndexReader, it doesn't
see the latest-and-greatest timestamp (ie. what is newest in the Lucene
world). I've tried doing indexWriter.commit() in my background thread, but
that can take on the order of 100 seconds as well.

Anyway, all the searching and updating of the index is all working just
fine, it's just that I'm seeing these occasional long periods of time which
seem to be unavoidable.

Any suggestions of things to try would be appreciated!

PS. I'm using Lucene 3.6 which it seems lots of people have used
successfully in the past, so I'm guessing the "use the newer Lucene" won't
necessarily help me.


-- 
Mike Jennings

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message