lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From govind bhardwaj <govins...@gmail.com>
Subject Re: Tokenize a dictionary of phrases
Date Sun, 21 Aug 2011 20:23:45 GMT
Hi Xlyang,

You should use KeywordAnalyzer() as it treats the entire string (multi-word
phrase in your case)
as it is without splitting the constituent words.

Thanks,
Govind

On Mon, Aug 22, 2011 at 1:23 AM, Xiyang Chen <settinghead@gmail.com> wrote:

> Hi,
>
> I have a dictionary of multi-word phrases and I'd like to analyze documents
> such that anything that appears in the dictionary will be treated as one
> single token.
> For example, if the dictionary contains "brown fox", then the sentence
> The quick brown fox jumps over the lazy dog.
>
> Will be tokenized as (with stopwords stripped):
> quick | brown fox | jumps | lazy | dog
>
> What is the best way to achieve this?
>
> Thanks,
> XIyang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
No trees were harmed in the creation of this message, but several thousand
electrons were mildly inconvenienced.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message