lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: any analyzer will keep punctuation?
Date Mon, 06 Mar 2017 14:26:13 GMT
Hi Zhao,

WhiteSpace tokeniser followed by a customised word delimiter filter factory would be solution.
Please see types attribute of the word delimiter filter for customising characters.

ahmet



On Monday, March 6, 2017 12:22 PM, Yonghui Zhao <zhaoyonghui@gmail.com> wrote:
Yes whitespace analyzer will keep punctuation, but it only breaks word by
space.


I didn’t explain my requirement clearly.

I want to an analyzer like standard analyzer but may keep some punctuation
configured.


2017-03-06 18:03 GMT+08:00 Ahmet Arslan <iorixxx@yahoo.com.invalid>:

> Hi,
>
> Whitespace analyser/tokenizer for example.
>
> Ahmet
>
>
>
> On Monday, March 6, 2017 10:21 AM, Yonghui Zhao <zhaoyonghui@gmail.com>
> wrote:
> Lucene standard anlyzer will remove almost all punctuation.
> In some cases, we want to keep some punctuation, for example in music
> search, some singer name and album name could be a punctuation.
>
> Is there any analyzer that we can customized punctuation to be removed?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message