lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Isakson" <Eric.Isak...@sas.com>
Subject RE: Searching across spaces
Date Thu, 11 May 2006 13:54:21 GMT
You might consider using overlapping bi-gram tokenization with stripped out whitespace and
a PhraseQuery.

So your tokenized content, "spongebob squarepants", would look like:

sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts

and your tokens for your query, "sponge bob", would look like

sp po on ng ge eb bo ob

Add each token to the PhraseQuery and you should match.

This is very similar to the techniques used for searching in Asian languages which do not
seperate words with spaces. There are probably some side effects for compound words that you
didn't mean to do this too, but without knowing the exact domain of compound words that you
wish to support, this is probably the best you will be able to do.

-----Original Message-----
From: Robert Young [mailto:bubblenut@gmail.com] 
Sent: Wednesday, May 10, 2006 2:09 PM
To: java-user@lucene.apache.org
Subject: Searching across spaces

Hi,

How can I search accross spaces in the document when the spaces aren't present in the search.
For example, if the document contains "spongebob squarepants" but the user searches on "sponge
bob" I would like to get the result.

Thanks
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message