lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Timm <tim...@aol.com>
Subject Re: problem index accented character with release version of solr 1.3
Date Thu, 18 Sep 2008 13:24:07 GMT
 From the XML 1.0 spec.: "Legal characters are tab, carriage return, 
line feed, and the legal graphic characters of Unicode and ISO/IEC 
10646."  So, \005 is not a legal XML character.  It appears the old StAX 
implementation was more lenient than it should have been and Woodstox is 
doing the correct thing.

-Sean

Ryan McKinley wrote:
> My guess is it has to do with switching the StAX implementation to 
> geronimo API and the woodstox implementation
>
> https://issues.apache.org/jira/browse/SOLR-770
>
> I'm not sure what the solution is though...
>
>
> On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote:
>
>> I have been using a stable dev version of 1.3 for a few months.
>> Today, I began testing the final release version, and I encountered a
>> strange problem.
>> The only thing that has changed in my setup is the solr code (I didn't
>> make any config change or change the schema).
>>
>> a document has a text field with a value that contains:
>> "Andr\005é 3000"
>>
>> Indexing the document by itself or as part of a batch, produces the
>> following error:
>> Sep 17, 2008 5:00:27 PM org.apache.solr.common.SolrException log
>> SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>> character ((CTRL-CHAR, code 5))
>> at [row,col {unknown-source}]: [5,205]
>>        at 
>> com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)

>>
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)

>>
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) 
>>
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) 
>>
>>        at 
>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>>        at 
>> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327)

>>
>>        at 
>> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)

>>
>>        at 
>> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)

>>
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

>>
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 
>>
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)

>>
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

>>
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

>>
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

>>
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)

>>
>>        at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
>>
>>        at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
>>
>>        at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

>>
>>        at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 
>>
>>        at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 
>>
>>        at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)

>>
>>        at 
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>        at java.lang.Thread.run(Thread.java:595)
>>
>> The latest version of the solr doesn't seem to like control characters
>> (\005, in this case), but previous versions handled them (or at least
>> ignored them).
>>
>> These characters shouldn't be in my documents, so there's a bug on my
>> end to track down.  However, I'm wondering if this was an expected
>> change or an unintended consequence of recent work . . .
>>
>>
>>
>>
>> -- 
>> -------------------------------------------------------------------------------------------------

>>
>> Be who you are and say what you feel,
>> because those who mind don't matter and
>> those who matter don't mind.
>> -- Dr. Seuss

Mime
View raw message