lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: WhiteSpaceTokenizer
Date Fri, 15 Aug 2014 12:24:56 GMT
Sure, that should be a configurable option.

Oh, and I neglected to mention a workaround: use the pattern tokenizer, 
which doesn't have a limit (yet.) But it might be slower.

-- Jack Krupansky

-----Original Message----- 
From: Sheng
Sent: Friday, August 15, 2014 8:13 AM
To: java-user@lucene.apache.org
Subject: Re: WhiteSpaceTokenizer

Thanks, Jack. I haven't added myself to the contributor list yet, will do
that and then login  and comment on that ticket. One quick comment:
wouldn't it be more reasonable to throw exception it a token length is more
than 255, if relaxing that limit is still debatable? This way user would
know immediately something is wrong.

On Friday, August 15, 2014, Jack Krupansky <jack@basetechnology.com> wrote:

> Yeah, it should be documented better, and configurable.
>
> Some discussion of related issues here:
> https://issues.apache.org/jira/browse/LUCENE-1118
> https://issues.apache.org/jira/browse/SOLR-4148
>
> I actually filed a Jira for this already. No action so far, but PLEASE
> feel free to comment on it:
> https://issues.apache.org/jira/browse/LUCENE-5785
>
> -- Jack Krupansky
>
> -----Original Message----- From: Sheng
> Sent: Thursday, August 14, 2014 11:38 PM
> To: java-user@lucene.apache.org
> Subject: WhiteSpaceTokenizer
>
> The length of token has to be shorter than 255, otherwise there will
> be unpredictable behaviors for this tokenizer. I see 255 is set as a
> private final in the src code, but there is no documentation to explicitly
> address that. Can we either make that number configurable (if not an
> option, I'd like to know why), or put some notes to its java doc? I had a
> hard time to figure that out...
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message