lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
Date Fri, 05 Oct 2012 10:52:38 GMT
You're right, but it's fixed now. In Solr's analysis page i spotted an incorrect endOffset
for some tokens. After correcting the endOffset the error no longer appears. I should more
carefully check the offsets emitted.

Thanks for your time anyway :) 
 
-----Original message-----
> From:Thomas Matthijs <lists@selckin.be>
> Sent: Thu 04-Oct-2012 15:55
> To: java-user@lucene.apache.org
> Subject: Re: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
> 
> And to include the code
> 
> On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma
> <markus.jelsma@openindex.io> wrote:
> > I forgot to add that this is with today's build of trunk.
> >
> > -----Original message-----
> >> From:Markus Jelsma <markus.jelsma@openindex.io>
> >> Sent: Thu 04-Oct-2012 15:42
> >> To: java-user@lucene.apache.org
> >> Subject: Highlighter IOOBE with modified HyphenationCompoundWordTokenFilter
> >>
> >> Hi,
> >>
> >> I've modified the HyphenationCompoundWordTokenFilter to emit less subtokens
because the original filter can emit all kinds of subtokens that have a very different meaning
on their own. I've modified it so no overlapping subtokens are emitted and no subtokens are
emitted that can be found within another subtoken. I've also modified it to force that the
generated subtokens comprise the original token and if they don't forget the subtokens. It
also doesn't return the original token anymore, the original filter produces a duplicate of
the original input token. For example: verzekeringmaatschappij now becomes verzekering and
maatschappij and not verzekeringmaatschappij, ver, zeker, verzeker, zekering, ringmaat, maat
and more.
> >>
> >> But it seem that i have done something wrong because my modified version sometimes
causes the Highlighter to throw the following IOOBE:
> >>
> >> java.lang.StringIndexOutOfBoundsException: String index out of range: -14
> >>         at java.lang.String.substring(String.java:1937)
> >>         at org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.makeFragment(BaseFragmentsBuilder.java:172)
> >>         at org.apache.lucene.search.vectorhighlight.BaseFragmentsBuilder.createFragments(BaseFragmentsBuilder.java:138)
> >>         at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:186)
> >>         at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:571)
> >>         at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
> >>         at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
> >>         at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214)
> >>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1750)
> >>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> >>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> >>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >>         .....
> >>
> >> Anyone to point me in the right direction? I've checked the LIA book on how
to manipulate the tokenstream and thought it should be alright. My analysis tests also yield
good results, nothing strange to be found. Or could it be an error in the highlighter that
only now shows up?
> >>
> >> Thanks,
> >> Markus
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message