lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: FreeTextSuggester throwing error "token must not contain separator byte"
Date Tue, 25 Jul 2017 03:31:09 GMT
The shingle filter may use space as the separator between shingles that it
generates. The admin/ analysis page is your friend.

On Jul 24, 2017 2:45 PM, "Angel Todorov" <attodorov@gmail.com> wrote:

> Hi Rick,
>
> Yep, that's really weird, because I am using the StandardTokenizerFactory,
> which is supposed to remove whitespace. Also tried the
> WhitespaceTokenizerFactory. I'll have a look at other analyzers or if
> nothing works maybe implement my own.
>
> I am using a Shingle filter right after the StandardTokenizer, not sure if
> that has anything to do with it.
>
>
> Thanks,
> Angel
>
>
> On Tue, Jul 25, 2017 at 12:09 AM Rick Leir <rleir@leirtech.com> wrote:
>
> > Angel,
> > The 20 byte is an ASCII space character, which is a separator in most
> > contexts. Breaking the buffer at spaces, you can see 6 non-space tokens.
> >
> > Have a look at your analysis chain and see why you are getting this.
> > Cheers -- Rick
> >
> > On July 24, 2017 4:27:00 PM EDT, Angel Todorov <attodorov@gmail.com>
> > wrote:
> > >Hi guys,
> > >
> > >I am trying to setup the FreeTextSuggester/ Lookup Factory in a
> > >suggester
> > >definition in SOLR. Unfortunately while the index is building, I am
> > >encountering the following errors:
> > >
> > >*"msg":"tokens must not contain separator byte; got token=[30 20 30 20
> > >32
> > >20 72 20 61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than
> > >expected max ngram size=5","trace":"java.lang.IllegalArgumentException:
> > >tokens must not contain separator byte; got token=[30 20 30 20 32 20 72
> > >20
> > >61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than expected
> > >max
> > >ngram size=5\r\n\tat
> >
> > >org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.build(
> FreeTextSuggester.java:362)\r\n\tat
> > >*
> > >
> > >I've also opened the following issue, because i don't think it's right
> > >not
> > >to handle this exception:
> > >
> > >https://issues.apache.org/jira/browse/SOLR-11139
> > >
> > >But my question is about the error in general - why is it occurring? I
> > >only
> > >have English text, nothing special.
> > >
> > >Thanks,
> > >Angel
> >
> > --
> > Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message