lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: WildCard search replacement
Date Thu, 17 Mar 2005 21:11:59 GMT
That's a great technique - thanks for sharing it!

	Erik

On Mar 14, 2005, at 6:54 AM, Volodymyr Bychkoviak wrote:

> Hi all.
>
>
> I have large index of documents (about 1.6 millions)
>
> One field (for example called “number”) contains string of digits.
>
> I need to do wildcard search on this field such as “*expression*” 
> (i.e. all documents that contains “expression” in this field.
>
> When I run such search with very short expression (i.e. "*321") I get 
> OutOfMemoryError or TooManyClauses Exception. (This case depends on 
> BooleanQuery.maxClauseCount setting).
>
> So I found following workaround. I index this field as sequence of 
> terms, each of containing single digit from needed value. (For example 
> I have “123214213” value that needs to be indexed. Then it will be 
> indexed as sequence of “1”,”2”,”3”,”2”,”1”,”4”,”2”,”1”,”3”
terms.) 
> This can be done by custom Analyzer class.
>
> To search in this by “wildcard” query I do search by PhraseQuery, 
> which contains single digit terms.
>
> For example: to search documents which contains “321” in field named 
> “number” I create following PhraseQuery:
>
>    PhraseQuery phraseQuery = new PhraseQuery();
>
>    phraseQuery.add(new Term("number ","3"));
>
>    phraseQuery.add(new Term("number ","2"));
>
>    phraseQuery.add(new Term("number ","1"));
>
> This approach works faster in case when you need to do search by very 
> short expression and never run out of memory (or throws TooManyClauses 
> Exception).
>
> I think this can be useful for someone who needs similar functionality.
>
> Also any comments are appreciated.
>
>
> Regards,
>
> Volodymyr Bychkoviak
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message