lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Young <bubble...@gmail.com>
Subject Re: Searching across spaces
Date Thu, 11 May 2006 15:38:43 GMT
That sounds like just what I'm looking for. Do you know if this is 
covered in Lucene in Action or where I can find more information about it.

Eric Isakson wrote:

>You might consider using overlapping bi-gram tokenization with stripped out whitespace
and a PhraseQuery.
>
>So your tokenized content, "spongebob squarepants", would look like:
>
>sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts
>
>and your tokens for your query, "sponge bob", would look like
>
>sp po on ng ge eb bo ob
>
>Add each token to the PhraseQuery and you should match.
>
>This is very similar to the techniques used for searching in Asian languages which do
not seperate words with spaces. There are probably some side effects for compound words that
you didn't mean to do this too, but without knowing the exact domain of compound words that
you wish to support, this is probably the best you will be able to do.
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message