lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Tokenize a dictionary of phrases
Date Mon, 22 Aug 2011 13:45:20 GMT
Hmmm, would it work for your case to use Synonyms? If you set
expand=false

and in your synonyms file have:
quick brown => quickbrown

it might do what you want....

Best
Erick

On Sun, Aug 21, 2011 at 3:53 PM, Xiyang Chen <settinghead@gmail.com> wrote:
> Hi,
>
> I have a dictionary of multi-word phrases and I'd like to analyze documents such that
anything that appears in the dictionary will be treated as one single token.
> For example, if the dictionary contains "brown fox", then the sentence
> The quick brown fox jumps over the lazy dog.
>
> Will be tokenized as (with stopwords stripped):
> quick | brown fox | jumps | lazy | dog
>
> What is the best way to achieve this?
>
> Thanks,
> XIyang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message