lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yueyu lin" <popeye...@gmail.com>
Subject Re: 7GB index taking forever to return hits
Date Tue, 15 Aug 2006 00:40:02 GMT
To avoid "TooManyClauses", you can try Filter instead of Query. But that
will be slower.
Form what I see is that there are so many keys that match your query, it
will be tough for Lucene.

On 8/14/06, Van Nguyen <vnguyen@ur.com> wrote:
>
> It was how I was implementing the search.
>
> I am using a boolean query.  Prior to the 7GB index, I was searching
> over a 150MB index that consist of a very small part of the bigger
> index.  I was able to set my BooleanQuery to
> BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE) and that worked fine.
> But I think that's the cause of my problem with this bigger index.
> Commenting that out, I get an TooManyClause Exception.  A typical query
> would look something like this:
>
> +CONTENTS:*white* +CONTENTS:*hard* +CONTENTS:*hat* +COMPANY_CODE:u1
> +LANGUAGE:enu -SKU_DESC_ID:0 +IS_DC:d +LOCATION:b72
>
> BooleanQuery q = new BooleanQuery();
>
> WildcardQuery wc1 = new WildcardQuery("CONTENTS", "*white*");
> WildcardQuery wc2 = new WildcardQuery("CONTENTS", "*hard*");
> WildcardQuery wc3 = new WildcardQuery("CONTENTS", "*hat*");
> q.add(wc1, BooleanClause.Occur.MUST);
> q.add(wc2, BooleanClause.Occur.MUST);
> q.add(wc3, BooleanClause.Occur.MUST);
>
> TermQuery t1 = new TermQuery("COMPANY_CODE", "u1");
> q.add(t1, BooleanClause.Occur.MUST);
>
> TermQuery t2 = new TermQuery("LANGUAGE", "enu");
> q.add(t2, BooleanClause.Occur.MUST);
> .
> .
> .
>
> I take it this is not the most optimal way about this.
>
> So that leads me to my next question... What is the most optimal way
> about this?
>
> Van
>
> -----Original Message-----
> From: yueyu lin [mailto:popeyelin@gmail.com]
> Sent: Monday, August 14, 2006 11:30 AM
> To: java-user@lucene.apache.org
> Subject: Re: 7GB index taking forever to return hits
>
> 2GB limitation only exists when you want to put them to memory in 32bits
> box.
> Our index size is larger than 13 giga bytes, and it works fine.
> I think it must be something error in your design. You can use Luke to
> see what happened in your index.
>
> On 8/14/06, Van Nguyen <vnguyen@ur.com> wrote:
> >
> >  Hi,
> >
> >
> >
> > I have a 7GB index (about 45 fields per document X roughly 5.5 million
> > docs) running on a Windows 2003 32bit machine (dual proc, 2GB memory).
>
> > The index is optimized.  Performing a search on this index will just
> > "hang" when performing the search (wild card query with a sort).  At
> > first the CPU usage is 100%, then drops down to 50% after a minute or
> > so, and then no CPU utilization... but the thread is still trying to
> > perform the search.  I've tried this in my J2EE app and in a main
> > program.  Is this due to the 2GB limitation of the 32bit OS (I didn't
> > realize the index would be this big... just let it run over the
> weekend).
> >
> >
> >
> > If this is due to the 2GB limitation of the 32bit OS and since I have
> > this 7GB index built already (and optimized), is there a way to split
> > this into 2GB indices w/o having to re-index?  Or is this due to
> another factor?
> >
> >
> >
> > Van
> >
> > United Rentals
> > Consider it done.(tm)
> > 800-UR-RENTS
> > unitedrentals.com
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> --
> Yueyu Lin
>
> United Rentals
> Consider it done.™
> 800-UR-RENTS
> unitedrentals.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
--
Yueyu Lin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message