lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Z <zjavie...@yahoo.com>
Subject unlimited wildcard term expansion
Date Wed, 30 Jun 2004 17:50:43 GMT
Hi,
 
I am trying to find a way to handle the wildcard queries in Lucene without going out of memory
and have been having some problems with it.  
 
I have modified some parts in search part of Lucene to just keep only about 1000 terms in
memory and write the rest of the terms to a file (this is done in the getQuery() method of
MultiTermQuery.java, PrefixQuery.java, etc.).  
 
Then when we create scorer objects and collect scores for each clause in the score() method
of the BooleanScorer.java, after all the clauses (that are in memory) are processed, then
I continue reading from the file that I created earlier.  I read out each term from the file
and create a TermQuery, then get the scorer object from this TermQuery and collect the score
for it.
 
Then the bucketTable will do collectHits of everything.
 
I have tested out my changes with small indexes with about 2 terms in memory and about 2 or
3 terms in the file, and it worked fine.
 
However, when I tried this out with bigger indexes (> 1 million docs) and with 1000 in
memory and 972 in the file, I got into an infinite loop when doing bucketTable.collectHits().
 I printed out the doc in each bucket and noticed that about half way through the bucket list,
it started to have about 4 - 5 repeated docs in the rest of the list and there was no null
at the end of the list to end it.
 
I have looked at everywhere and even tried to increase the bucket table size to be the sum
of the number of terms in memory and number of terms in the file.  But that still did not
work.
 
I would really appreciate any suggestions/ideas/help on this.
 
Thanks.
Javier

		
---------------------------------
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message