Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E506611622 for ; Thu, 14 Aug 2014 22:59:05 +0000 (UTC) Received: (qmail 27771 invoked by uid 500); 14 Aug 2014 22:59:04 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27706 invoked by uid 500); 14 Aug 2014 22:59:04 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27691 invoked by uid 99); 14 Aug 2014 22:59:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 22:59:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 22:58:38 +0000 Received: by mail-vc0-f169.google.com with SMTP id le20so2239694vcb.28 for ; Thu, 14 Aug 2014 15:58:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=DfofyA+TduaRgm3ZYTuKkViynFoj+h4kG2jg0wlmYVg=; b=DS9Y+Fxq5LdlahBNAy1vISGSDOderO1xb2LwNDygb+gznNKHSOVXFsy8pHvGU4c0yh rR9kUvldKKbHxLBL+0xsqLNKHg2guDS0P7s46hmGSAXYLYjYOpQF93x8fi7EpNNBeef8 V2PfArhknxRD63FfymuFQufcam0RppeDLs1AG2naSqywDRy2gginEUeLkYc2rpTzWBM5 g8mC3tmp99baVNc7y5LhwH70eP1TEcWWKp9gkeBQqYqEi+waMDqm0IIIjU9Ueye9hftB pzVcbWwZZ1eWKwBz0+GYw7cG0GR9lGKRjtcL5yNOdIHqeAUg6EUuqtMAwNSyAjBXlwfG JUcw== X-Gm-Message-State: ALoCoQm4ULi1FZWUDawwferKzQUGxMoft5CjzAXxyp+V4M5O+dZZo1vbPv2ImdH7LxnXkU7k39BI X-Received: by 10.52.61.99 with SMTP id o3mr4041678vdr.46.1408057115870; Thu, 14 Aug 2014 15:58:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.13.200 with HTTP; Thu, 14 Aug 2014 15:58:15 -0700 (PDT) In-Reply-To: References: From: Michael McCandless Date: Thu, 14 Aug 2014 18:58:15 -0400 Message-ID: Subject: Re: Lucene newbie in need of a hint To: Lucene Users , mike.c.jennings@gmail.com Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org 3.6 is quite old by now ... but that behavior (100s pause on reopen) is strange. Can you capture all Java threads during that time and post back? It looks like you're reopening the reader correctly, though be careful if you have in-flight searches running in other threads; use SearcherManager to help for that. You don't need to IW.commit: that's only for durability (having your index survive an OS crash, power loss, etc.). Mike McCandless http://blog.mikemccandless.com On Thu, Aug 14, 2014 at 6:16 PM, Michael Jennings wrote: > Hi everyone, > > I'm a bit of a Lucene newb, but a fairly experienced Java developer. Hope > someone can give me some clues as to what I may be doing wrong. > > In essence I've got a lucene index built off of a database table that gets > updated at a rate of about 1 row changing every 2 seconds or so. I've got a > webapp whose sole purpose in life is to provide a simple front end for > searching this table. > > The table in question lives in an Oracle db (not that Java cares) and it > has 2 datetime/timestamp columns; ent_dtm and upd_dtm. When a new row gets > inserted into the table, a trigger sets the ent_dtm to be "right now". When > a row gets updated, a trigger sets the upd_dtm to be "right now". > > queries like: SELECT COL1, COL2,... COLn from THE_TABLE where ENT_DTM > > (some timestamp) are very fast, as are queries like: > > SELECT COL1, COL2,... COLn from THE_TABLE where UPD_DTM > (some timestamp) > > These are the sorts of queries I use to keep my lucene index "in synch" > with the table and these queries are fast and there are no issues with them. > > As you would expect, each Document in my lucene index roughly corresponds > to a row in THE_TABLE, including 2 fields called "ent_dtm" and "upd_dtm" > > THE_TABLE has a primary key which I will call THE_ID. Correspondingly, a > Document in the Lucene index has a field called "the_id" > > values of "the_id" are typically numbers (Field.Store.YES, > Field.Index.NOT_ANALYZED_NO_NORMS) with the exception of a "special" value > of "newest". The Document with the field "the_id" with the value of > "newest" contains just 2 more fields, ent_dtm and upd_dtm. > > This Document is just used to keep track of "what's the newest thing in > Lucene's world" > > So this is what my webapp is doing: > > In a background thread, every 1.2 seconds it checks the Lucene index for > "what's the newest thing in my world" (call that X) uses that to hit the > database asking it in essence "have you got anything newer in your world > than X", if it returns say 3 rows newer than X, call the newest of those > rows Y. > > Then, this background thread updates the Document with the_id="newest" with > Y then goes to sleep again for 1.2 seconds. Lather, rinse, repeat. > > Incoming search requests attempt to use a "Near Real Time" IndexReader > (with an IndexSearcher wrapped around it) to search the index. > > Again, everything seems to do what it says on the box. > > My problem is that I can't seem to avoid the occasional 100 second pause > while IndexReader "refreshes itself". > > I create my one-and-only shared IndexReader thusly: > > indexReader = IndexReader.open(indexWriter, true); > > and I check if it needs to be refreshed by calling indexReader.isCurrent() > > and I "refresh" it with the following method: > > public static IndexReader freshVersionOf(IndexReader indexReader) throws > IOException { > StopWatch stopWatch = new StopWatch(); > final IndexReader newReader = IndexReader.openIfChanged(indexReader, > true); > logger.info("IndexReader.openIfChanged() took " + > stopWatch.elapsedSeconds() + " seconds"); > if (newReader == null) { > return indexReader; > } else { > indexReader.close(); > return newReader; > } > } > > Which is basically a Lucene method moved into a static method in my own > code (my method closes the old indexReader, that's the only difference) > > > Sometimes IndexReader.openIfChanged(indexReader, true); takes what seems > like a crapload of time. If I don't "freshen" the IndexReader, it doesn't > see the latest-and-greatest timestamp (ie. what is newest in the Lucene > world). I've tried doing indexWriter.commit() in my background thread, but > that can take on the order of 100 seconds as well. > > Anyway, all the searching and updating of the index is all working just > fine, it's just that I'm seeing these occasional long periods of time which > seem to be unavoidable. > > Any suggestions of things to try would be appreciated! > > PS. I'm using Lucene 3.6 which it seems lots of people have used > successfully in the past, so I'm guessing the "use the newer Lucene" won't > necessarily help me. > > > -- > Mike Jennings --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org