lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: synchronize hits variable?
Date Tue, 29 May 2007 17:36:35 GMT
Except pay heed to the documentation that says something
about calling IndexReader.doc() inside the loop is expensive. Although
lazy loading is, I think, designed to help with this....

Erick

On 5/29/07, John Powers <jpowers@configureone.com> wrote:
>
> If I need all the documents returned by the query to to the Hits object,
> does a hitcollector work?
>
> I take it that each time I Hits.doc(i) I get the full document; so if
> that's the problem, can I just get all of the single column I need for
> all those docs?
>
> Size: the directory I'm dealing with on a windowsXP machine is 5M.   how
> can I be getting 20M Hits variable?
>
> I've looked in the LIA book and it seems to forloop through all the hits
> just like I am doing.
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, May 28, 2007 11:46 AM
> To: java-user@lucene.apache.org
> Subject: Re: synchronize hits variable?
>
> You do not want to be using Hits. Frankly, the way pagination should
> normally be done, Hits caching means almost nobody really wants to be
> using Hits, but in your case it's even worse. Look into a Hit collector
> -- not only are you caching every single document in your search
> results, but you are re-querying many times for a single query by
> running through all of the result in Hits! I don't have the time at the
> moment to explain further (I am sure someone else will) but you need to
> drop Hits like a bad habit and look into HitCollector.
>
> - Mark
>
> John Powers wrote:
> > Thanks for the response.  Its definitely the user search object's
> > search().   I have to iterate through all the hits that come back to
> get
> > all the categories used in the results, so the number that hits gets
> > really doesn't matter--ill need them all.
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Sunday, May 27, 2007 10:26 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: synchronize hits variable?
> >
> > Hi John,
> >
> > 20M sounds suspicious.  Without seeing the code, it's hard to tell.
> My
> > guess is the problem lies elsewhere or some piece of Lucene is being
> > incorrectly used.  Or maybe your Lucene Documents are just very large.
> > Are they?  You could go modify Hits source and change the number of
> hits
> > that Hits instance loads.  I think it caches either 100 or 200
> (haven't
> > looked at it in a while).  There is no API for changing it, so you
> could
> > change it in the sources, recompile, redeploy, and see if that helps
> > you.
> >
> > Otis
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> >
> > ----- Original Message ----
> > From: John Powers <jpowers@configureone.com>
> > To: java-user@lucene.apache.org
> > Sent: Sunday, May 27, 2007 9:17:59 PM
> > Subject: synchronize hits variable?
> >
> > In a j2ee webapp we have a search object that stores a user's search
> > preferences (items/page, detail level, etc).  it has a search() that
> > calls a static method getSearcher() that returns a static
> IndexSearcher
> > that all these user search objects use.....searching with that gives
> us
> > a Hits object that this user object iterates through to find out what
> > categories are used, which ones to display, etc.    this hits object
> is
> > huge.   Its fluffing up each user session by 20M in some cases.
> This
> > is unacceptable of course.    I am sure many have run into this issue,
> > and I was curious what you did to solve it.     If I use a local
> > variable to that search method, null it after I'm done and call gc(),
> it
> > still is bad.    I can't put the hits variable into a static object or
> > attribute cause of course there are multiple of these user search
> > objects using it at any time.     Even if I isolated the user search
> > part to a singleton, others may use it at the same time as well.
> I
> > imagine some sort of synchronization of that static variable is in
> order
> > but am not quite sure.   does someone have an example of this?  I
> would
> > image everyone has had to deal with this problem.   20M is just to
> big.
> > When we use a profiler we find that it's a char[] that seems to be
> > holding a lot of the data..like 16M of it.    For different indexes in
> > different servers, this size is different, but for the most part its
> way
> > to big everywhere.
> >
> >
> >
> > Thoughts?  I appreciate any help on this.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message