lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From heritrix.lucene <heritrix.luc...@gmail.com>
Subject Re: searching for the part of a term.
Date Tue, 26 Sep 2006 12:47:15 GMT
Hi,
While i was searching forum for my problem for searching a substring, i got
few very good links.

http://www.gossamer-threads.com/lists/lucene/java-user/39753?search_string=Bitset%20filter;#39753
http://www.gossamer-threads.com/lists/lucene/java-user/7813?search_string=substring;#7813
http://www.gossamer-threads.com/lists/lucene/java-user/5931?search_string=substring;#5931

In first, WildcardTermEnum is used.
>>I tried with this but this takes a lot of time in searching.

The other solution i found was to create a tokenstream which splits a token
into multiple tokens, and then index those token. like : google into google,
oogle, ogle....
And then while searching make a prefix query , then search.
>>But here it seems to create a lot of tokens from one token resulting index
size multiple times bigger then if we index a single token.

Since the overhead in first is the speed of the system, i think adopting
second method will be better.

Is there any other solution for this problem?? Am i going in right
direction??

It'll be great to see your response...

Regards,









On 9/23/06, heritrix. lucene <heritrix.lucene@gmail.com> wrote:
>
> Hi All,
>
> How can i make my search so that if i am looking for the term "counting"
> the documents containing "accounting" must also come up.
>
> Similarly if i am looking for term "workload", document s containing work
> also come up as a search result.
>
> Wildcard query seems to work in the first case, but if the index size is
> very big, it throws TooManyClausesException.
>
> Is there a way to resolve this issue, apart from indexing n-grams of each
> term.
>
>
> Regards,
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message