lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Lucene hits.length()
Date Wed, 09 Aug 2006 17:49:25 GMT
I think, but am not certain (chime in here guys) that this is expected
behavior. As I remember from various threads, internally indexing uses a
RAMdir to accumulate data until it merges it with the FSDir. Since the
searcher and indexer are separate, I assume that the searcher is looking at
the snapshot that is on disk and missing that in the RAMdir. After you
merge, the RAMdir data has been added to that on disk, and the two are "in
synch".

So I guess my real question is "why do you care"? Is this affecting your
application or is this an anomaly that you want to understand so you don't
get surprised? If the latter, I think you're OK if you open your index after
merging, you'll have the data available....

BTW, I assume that when you say hits.length() is not correct, you're getting
fewer hits than you *know* are in the index (including the stuff you're
currently indexing but haven't merged yet).

Best
Erick



On 8/9/06, Marcus Falck <marcus.falck@observer.se> wrote:
>
> Still worried =)
> You see it doesn't update the hits.length() in a correct way when I create
> a new searcher. The correct update does just occur in the merges. =/
>
> -----Ursprungligt meddelande-----
> Från: Erick Erickson [mailto:erickerickson@gmail.com]
> Skickat: den 9 augusti 2006 15:34
> Till: java-user@lucene.apache.org
> Ämne: Re: Lucene hits.length()
>
> Then you won't see anything added to your index between times. Does this
> identify your problem or are you still worried?
>
> Erick
>
> On 8/9/06, Marcus Falck <marcus.falck@observer.se> wrote:
> >
> > I'm opening a new searcher every 3:rd minute.
> >
> > -----Ursprungligt meddelande-----
> > Från: Erick Erickson [mailto:erickerickson@gmail.com]
> > Skickat: den 8 augusti 2006 18:58
> > Till: java-user@lucene.apache.org
> > Ämne: Re: Lucene hits.length()
> >
> > I'll take a stab at it.... When are you opening/closing your searcher?
> > When
> > you open a searcher, you get a snapshot of the index at that instant,
> and
> > subsequent modifications aren't visible until you open a new searcher
> (at
> > least I think I've got this right).
> >
> > And I'm sure this also interacts with the writer merge settings
> > "interestingly".
> >
> > Personally, I'd worry about this a lot more if it happened after I'd
> > closed
> > my writer and opened a new reader <G>...
> > Of course, my app has an index that is updated rarely (every two weeks),
> > so
> > I haven't dug into too many details in this area...
> >
> >
> > Best
> > Erick
> >
> > On 8/8/06, Marcus Falck <marcus.falck@observer.se> wrote:
> > >
> > > I have noticed some strange behavior when searching my lucene index.
> > >
> > >
> > >
> > > I'm adding 500.000 docs to an index.
> > >
> > >
> > >
> > > MergeFactor = 10
> > >
> > > MinMerge = 5000
> > >
> > >
> > >
> > > When 49999 have been added ( just before the first 10 * 5000 merge )
> the
> > > hits.length() is reporting around 1000 hits for a keyword (which by
> the
> > > way is around the same count as with 5000 docs added). After the
> 10*5000
> > > merge the hits.length() returns around 8000 hits, which seems to be a
> > > lot more reasonable. Since I'm adding content in date order ( oldest
> > > first ) I have also tried to sort the hits (newest date first) and
> > > display the top 10 hits.
> > >
> > >
> > >
> > > According to that output it seems that the documents are added
> > > correctly.
> > >
> > >
> > >
> > > I'm using a multisearcher on top of a RAMDir and an FSDir. Using
> > > Lucene1.4.3
> > >
> > >
> > >
> > > Anybody that has any idea about why the hit count is so misleading?
> > >
> > >
> > >
> > > /
> > >
> > > Regards
> > >
> > > Marcus
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message