lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Powers" <jpow...@configureone.com>
Subject RE: synchronize hits variable?
Date Tue, 29 May 2007 16:18:42 GMT
If I need all the documents returned by the query to to the Hits object,
does a hitcollector work?    

I take it that each time I Hits.doc(i) I get the full document; so if
that's the problem, can I just get all of the single column I need for
all those docs?

Size: the directory I'm dealing with on a windowsXP machine is 5M.   how
can I be getting 20M Hits variable?   

I've looked in the LIA book and it seems to forloop through all the hits
just like I am doing.   

-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Monday, May 28, 2007 11:46 AM
To: java-user@lucene.apache.org
Subject: Re: synchronize hits variable?

You do not want to be using Hits. Frankly, the way pagination should 
normally be done, Hits caching means almost nobody really wants to be 
using Hits, but in your case it's even worse. Look into a Hit collector 
-- not only are you caching every single document in your search 
results, but you are re-querying many times for a single query by 
running through all of the result in Hits! I don't have the time at the 
moment to explain further (I am sure someone else will) but you need to 
drop Hits like a bad habit and look into HitCollector.

- Mark

John Powers wrote:
> Thanks for the response.  Its definitely the user search object's
> search().   I have to iterate through all the hits that come back to
get
> all the categories used in the results, so the number that hits gets
> really doesn't matter--ill need them all.     
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Sunday, May 27, 2007 10:26 PM
> To: java-user@lucene.apache.org
> Subject: Re: synchronize hits variable?
>
> Hi John,
>
> 20M sounds suspicious.  Without seeing the code, it's hard to tell.
My
> guess is the problem lies elsewhere or some piece of Lucene is being
> incorrectly used.  Or maybe your Lucene Documents are just very large.
> Are they?  You could go modify Hits source and change the number of
hits
> that Hits instance loads.  I think it caches either 100 or 200
(haven't
> looked at it in a while).  There is no API for changing it, so you
could
> change it in the sources, recompile, redeploy, and see if that helps
> you.
>
> Otis
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: John Powers <jpowers@configureone.com>
> To: java-user@lucene.apache.org
> Sent: Sunday, May 27, 2007 9:17:59 PM
> Subject: synchronize hits variable?
>
> In a j2ee webapp we have a search object that stores a user's search
> preferences (items/page, detail level, etc).  it has a search() that
> calls a static method getSearcher() that returns a static
IndexSearcher
> that all these user search objects use.....searching with that gives
us
> a Hits object that this user object iterates through to find out what
> categories are used, which ones to display, etc.    this hits object
is
> huge.   Its fluffing up each user session by 20M in some cases.
This
> is unacceptable of course.    I am sure many have run into this issue,
> and I was curious what you did to solve it.     If I use a local
> variable to that search method, null it after I'm done and call gc(),
it
> still is bad.    I can't put the hits variable into a static object or
> attribute cause of course there are multiple of these user search
> objects using it at any time.     Even if I isolated the user search
> part to a singleton, others may use it at the same time as well.
I
> imagine some sort of synchronization of that static variable is in
order
> but am not quite sure.   does someone have an example of this?  I
would
> image everyone has had to deal with this problem.   20M is just to
big.
> When we use a profiler we find that it's a char[] that seems to be
> holding a lot of the data..like 16M of it.    For different indexes in
> different servers, this size is different, but for the most part its
way
> to big everywhere.
>
>  
>
> Thoughts?  I appreciate any help on this.
>
>  
>
> Thanks
>
>  
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message