lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxym Mykhalchuk" <ma...@dit.unitn.it>
Subject Re: Searching across spaces
Date Thu, 11 May 2006 14:03:55 GMT
Eric,

IMHO the number of side-effects can be reduced by requiring "phrases":
tokens for your query, "sponge bob", would look like
"sp po on ng ge" eb "bo ob"

Maxym

==================================
Maxym Mykhalchuk
(+39) 320 8593170
PhD student at University of Trento, ITALY
==================================
----- Original Message ----- 
From: "Eric Isakson" <Eric.Isakson@sas.com>
To: <java-user@lucene.apache.org>
Sent: Thursday, May 11, 2006 3:54 PM
Subject: RE: Searching across spaces


> You might consider using overlapping bi-gram tokenization with stripped 
> out whitespace and a PhraseQuery.
>
> So your tokenized content, "spongebob squarepants", would look like:
>
> sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts
>
> and your tokens for your query, "sponge bob", would look like
>
> sp po on ng ge eb bo ob
>
> Add each token to the PhraseQuery and you should match.
>
> This is very similar to the techniques used for searching in Asian 
> languages which do not seperate words with spaces. There are probably some 
> side effects for compound words that you didn't mean to do this too, but 
> without knowing the exact domain of compound words that you wish to 
> support, this is probably the best you will be able to do.
>
> -----Original Message-----
> From: Robert Young [mailto:bubblenut@gmail.com]
> Sent: Wednesday, May 10, 2006 2:09 PM
> To: java-user@lucene.apache.org
> Subject: Searching across spaces
>
> Hi,
>
> How can I search accross spaces in the document when the spaces aren't 
> present in the search. For example, if the document contains "spongebob 
> squarepants" but the user searches on "sponge bob" I would like to get the 
> result.
>
> Thanks
> Rob
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message