Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 54743 invoked from network); 9 Aug 2006 17:50:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 9 Aug 2006 17:50:04 -0000 Received: (qmail 80338 invoked by uid 500); 9 Aug 2006 17:49:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80314 invoked by uid 500); 9 Aug 2006 17:49:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80294 invoked by uid 99); 9 Aug 2006 17:49:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2006 10:49:51 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of erickerickson@gmail.com designates 64.233.166.183 as permitted sender) Received: from [64.233.166.183] (HELO py-out-1112.google.com) (64.233.166.183) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2006 10:49:50 -0700 Received: by py-out-1112.google.com with SMTP id s49so445117pyc for ; Wed, 09 Aug 2006 10:49:29 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=e6c8+fIzNaU77j/WIBH0kMgSqCJrY2aV6cVQOpHHmHYxyGIuTuwkuGxWWtnmkT8p/bcVkaSyC9dwrHd5KAbWDDo8EMNC881XP4YckNNjnc6LBv32bbfVJoY78AIPmuLDLf93vEB7Ml4qWCByS/AB0rPLhe7JCr+UaD7wRKZBEHg= Received: by 10.35.37.18 with SMTP id p18mr1975361pyj; Wed, 09 Aug 2006 10:49:29 -0700 (PDT) Received: by 10.35.9.18 with HTTP; Wed, 9 Aug 2006 10:49:25 -0700 (PDT) Message-ID: <359a92830608091049v213b924bl19610bd83167581a@mail.gmail.com> Date: Wed, 9 Aug 2006 13:49:25 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Lucene hits.length() In-Reply-To: <8834A84C87A2C148AD46921BB8BFC97C021E567A@S1SE1MAIL.emea1.ad.group> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_90483_21914232.1155145765482" References: <8834A84C87A2C148AD46921BB8BFC97C021E567A@S1SE1MAIL.emea1.ad.group> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_90483_21914232.1155145765482 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline I think, but am not certain (chime in here guys) that this is expected behavior. As I remember from various threads, internally indexing uses a RAMdir to accumulate data until it merges it with the FSDir. Since the searcher and indexer are separate, I assume that the searcher is looking at the snapshot that is on disk and missing that in the RAMdir. After you merge, the RAMdir data has been added to that on disk, and the two are "in synch". So I guess my real question is "why do you care"? Is this affecting your application or is this an anomaly that you want to understand so you don't get surprised? If the latter, I think you're OK if you open your index afte= r merging, you'll have the data available.... BTW, I assume that when you say hits.length() is not correct, you're gettin= g fewer hits than you *know* are in the index (including the stuff you're currently indexing but haven't merged yet). Best Erick On 8/9/06, Marcus Falck wrote: > > Still worried =3D) > You see it doesn't update the hits.length() in a correct way when I creat= e > a new searcher. The correct update does just occur in the merges. =3D/ > > -----Ursprungligt meddelande----- > Fr=E5n: Erick Erickson [mailto:erickerickson@gmail.com] > Skickat: den 9 augusti 2006 15:34 > Till: java-user@lucene.apache.org > =C4mne: Re: Lucene hits.length() > > Then you won't see anything added to your index between times. Does this > identify your problem or are you still worried? > > Erick > > On 8/9/06, Marcus Falck wrote: > > > > I'm opening a new searcher every 3:rd minute. > > > > -----Ursprungligt meddelande----- > > Fr=E5n: Erick Erickson [mailto:erickerickson@gmail.com] > > Skickat: den 8 augusti 2006 18:58 > > Till: java-user@lucene.apache.org > > =C4mne: Re: Lucene hits.length() > > > > I'll take a stab at it.... When are you opening/closing your searcher? > > When > > you open a searcher, you get a snapshot of the index at that instant, > and > > subsequent modifications aren't visible until you open a new searcher > (at > > least I think I've got this right). > > > > And I'm sure this also interacts with the writer merge settings > > "interestingly". > > > > Personally, I'd worry about this a lot more if it happened after I'd > > closed > > my writer and opened a new reader ... > > Of course, my app has an index that is updated rarely (every two weeks)= , > > so > > I haven't dug into too many details in this area... > > > > > > Best > > Erick > > > > On 8/8/06, Marcus Falck wrote: > > > > > > I have noticed some strange behavior when searching my lucene index. > > > > > > > > > > > > I'm adding 500.000 docs to an index. > > > > > > > > > > > > MergeFactor =3D 10 > > > > > > MinMerge =3D 5000 > > > > > > > > > > > > When 49999 have been added ( just before the first 10 * 5000 merge ) > the > > > hits.length() is reporting around 1000 hits for a keyword (which by > the > > > way is around the same count as with 5000 docs added). After the > 10*5000 > > > merge the hits.length() returns around 8000 hits, which seems to be a > > > lot more reasonable. Since I'm adding content in date order ( oldest > > > first ) I have also tried to sort the hits (newest date first) and > > > display the top 10 hits. > > > > > > > > > > > > According to that output it seems that the documents are added > > > correctly. > > > > > > > > > > > > I'm using a multisearcher on top of a RAMDir and an FSDir. Using > > > Lucene1.4.3 > > > > > > > > > > > > Anybody that has any idea about why the hit count is so misleading? > > > > > > > > > > > > / > > > > > > Regards > > > > > > Marcus > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_90483_21914232.1155145765482--