lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SOLR-5983) HTMLStripCharFilter is treating CDATA sections incorrectly
Date Thu, 17 Apr 2014 06:27:18 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Rowe resolved SOLR-5983.
------------------------------

    Resolution: Fixed
      Assignee: Steve Rowe

Committed to trunk, branch_4x, and the lucene_solr_4_8 branch.

Thanks Dan!

> HTMLStripCharFilter is treating CDATA sections incorrectly
> ----------------------------------------------------------
>
>                 Key: SOLR-5983
>                 URL: https://issues.apache.org/jira/browse/SOLR-5983
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.7.1
>         Environment: Rhat - running in AWS Large Instance (4processors, 16gb ram) working
in attached storage.
>            Reporter: Dan
>            Assignee: Steve Rowe
>             Fix For: 4.8, 4.9, 5.0
>
>         Attachments: SOLR-5983.patch, temp.txt
>
>
> I'm hammering on this Solr Instance.  I've got three cores that I'm using to store millions
of small bits of reference data.  I'm using a heavily tweaked Tika to parse xml files and
ingest them into Solr, while referencing this data.  So I'm making hundreds of query requests
against solr, while also making some substantial posts. (I queue up the posts, in general
sending in 100 documents at a time). 
> Stack Trace:
> 4099640 [qtp39890933-24] WARN  org.eclipse.jetty.servlet.ServletHandler  – Error for
/solr/us_patent_gran
> t/update
> java.lang.AssertionError: Attempting to read past the end of a segment.
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter$TextSegment.nextChar(HTMLStripCharFi
> lter.java:30885)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.zzDoEOF(HTMLStripCharFilter.java:311
> 50)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.nextChar(HTMLStripCharFilter.java:31
> 802)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30829)
>         at org.apache.lucene.analysis.charfilter.HTMLStripCharFilter.read(HTMLStripCharFilter.java:30842)
       at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.zzRefill(StandardTokenizerImpl40.java:916)
>         at org.apache.lucene.analysis.standard.std40.StandardTokenizerImpl40.getNextToken(StandardTokenizerImpl40.java:1123)
>         at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:17
> 5)
>         at org.apache.lucene.analysis.payloads.TokenOffsetPayloadTokenFilter.incrementToken(TokenOffsetPa
> yloadTokenFilter.java:45)
>         at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:182)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:236)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:160)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:
> 69)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java
> :51)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProces
> sor.java:704)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProces
> sor.java:858)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProces
> sor.java:557)
>         at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:
> 100)
>         at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
>         at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.ja
> va:74)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message