lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santa Clause <noclu...@yahoo.com>
Subject Re: Running out of memory while doing a search
Date Mon, 26 Mar 2007 15:37:06 GMT
Here are the queries being run:
  
  +spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true)
  This works with 603 matches
  
  +spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true) +YNFIELD:y
  Runs out of memory (should have ~300 matches)
  
  
  +(+spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true))
  This works
  
  
  +(+spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true)) +(+YNFIELD:y)
  This runs out of memory
  
  
  

Santa Clause <noclueu2@yahoo.com> wrote:  Hello All,
  I am having an issue with running out of memory while doing certain  searches. I cannot
do the obvious an give my JVM more memory, I have no  more to give. Here is my situation.
 I have ~150 million documents  in my index with around 5 indexed fields. Each field is ~100
characters  and when tokenized around a max of 10 tokens, usually less. I have 2  fields that
are either y or n. These are the fields causing issues.  Curently about 50% are (y, n) and
the other times they are (n, y). In  the future we will be adding more y,n fields and the
will overlap in  vlues. Example y,y or n,n.
  
 Now when I add a TermQuery to my  boolean query (making the y,n value required, I get a memory
error.)  Without that term query, all my queries run fine.
  
 I thought  that maybe it was because the rest of my query was optional and so  adding the
additional y,n query was getting too many hits ~75 million.  So I tried keeping my original
BooleanQuery (without y,n field) Adding  it to another BooleanQuery making int required and
then adding the y,n  query to that. Still no joy.
  
 My other thought was to have  multiple indexes. Use one index when a value was y another
when it is n  and both when they want all informaton. The issue I came up with there  is when
a document has y,y or n,n and there is overlapping. That is  causing a bit of issue.
  
 Any thoughts or can someone explain  why I run out of memory when I add a required TermQuery
that contains  1/2 of my documents. The first part of my query should narrow the  results
down to a few hundred.
  
  Thanks,
  Richard K.
  
 
---------------------------------
Don't be flakey. Get Yahoo! Mail for Mobile and 
always stay connected to friends.

 
---------------------------------
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message