lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4216) Token X exceeds length of provided text sized X
Date Sun, 05 Aug 2012 13:04:02 GMT


Robert Muir commented on LUCENE-4216:

For Tashkeel, we should not adjust the offset since it is part of the word but not necessary
to be written when searching/indexing. it is the way how Arabic is written.

It has nothing to do with arabic. offsets will be wrong for the rest of your document if you
dont fix this.
> Token X exceeds length of provided text sized X
> -----------------------------------------------
>                 Key: LUCENE-4216
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 4.0-ALPHA
>         Environment: Windows 7, jdk1.6.0_27
>            Reporter: Ibrahim
>         Attachments:
> I'm facing this exception:
> Token رأيكم exceeds
length of provided text sized 170
> 	at
> 	at classes.myApp$16$
> I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without
successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208
but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug
or it is something wrong in my code with v4. The code that im using:
> final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<font color=red>",
"</font>"), new QueryScorer(query));
> .......
> final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(),
j, "Line", analyzer);
> final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get("Line"),
false, 10);
> Please note that this is working fine with v3.6

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message