jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@day.com>
Subject Re: jr 2.1 and xml content
Date Fri, 23 Jul 2010 07:48:26 GMT
On Thu, Jul 22, 2010 at 10:48 PM, John Langley
<John.Langley@mathworks.com> wrote:
> Thanks that definitely helped!
>
> But now I get the following errors / warnings:
> [#|2010-07-22T20:27:48.722+0000|WARNING|sun-appserver2.1|
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField|
> _ThreadID=16;_ThreadName=jackrabbit-pool-4;_RequestID=7c74399a-6822-4f8b-b6e1-fcc54e5c37f8;|Failed
to extract text from a binary property
> org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException
> from org.apache.tika.parser.xml.DcXMLParser@85deafc
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:130)
>        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>        at
> org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
>        at
> org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField
> $ParsingTask.run(LazyTextExtractorField.java:174)
>        at java.util.concurrent.Executors
> $RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask
> $Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>        at java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
>        at java.util.concurrent.ThreadPoolExecutor
> $Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor
> $Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: org.xml.sax.SAXParseException: The version is required in the
> XML declaration.
>        at
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
>        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown
> Source)
>        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
>        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown
> Source)
>        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown
> Source)
>        at
> org.apache.xerces.impl.XMLScanner.scanXMLDeclOrTextDecl(Unknown Source)
>        at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanXMLDeclOrTextDecl(Unknown Source)
>        at org.apache.xerces.impl.XMLDocumentScannerImpl
> $XMLDeclDispatcher.dispatch(Unknown Source)
>        at
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
> Source)
>        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>        at org.apache.xerces.jaxp.SAXParserImpl
> $JAXPSAXParser.parse(Unknown Source)
>        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
>        at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
>        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
>        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>        ... 11 more
> |#]
>
> Note: this only happens when I put a "file" in via webdav and the file
> has an .xml extension but is empty (which is a temporary state in our
> application)
>
> Is there anything I can or should do (other than tweaking the logging
> properties) to turn off this warning?

the warning is IMO legitimate (trying to index a zero-length file).
however, the length of the file could probably be checked
by o.a.jackrabbit.core.query.lucene.LazyTextExtractorField
before handing it over to TIKA and a less verbose warning
could be logged if the file is empty. feel free to create a
jira issue if it really bugs you.

cheers
stefan

>
> Thanks in advance, the first suggestion was great!
>
> -- Langley
>
>
> On Thu, 2010-07-22 at 10:36 -0400, Stefan Guggisberg wrote:
>
>> this might help:
>> http://markmail.org/message/hctkq6looial7xzr
>>
>> cheers
>> stefan
>>
>> On Thu, Jul 22, 2010 at 4:08 PM, John Langley
>> <John.Langley@mathworks.com> wrote:
>> > We recently upgraded from jackrabbit 2.0 to jackrabbit 2.1 and
>> > discovered much to our chagrin that storing xml content in the
>> > repository has been significantly changed. In fact, from our point of
>> > view it has been broken!
>> >
>> > Previously, we had been storing xml content via a webdav client into the
>> > repository and everything was fine. Now when we try to do this, the
>> > result is that the content length of these xml files (regardless of
>> > whether the "file" has a .xml extension or not) is 0 length!
>> >
>> > Certainly there MUST be a configuration that can help us "turn off" any
>> > special processing of xml content that we store in the repository. Could
>> > someone please point this out?
>> >
>> > Thanks in advance!
>> >
>> > -- Langley
>> >
>> >
>> >
>

Mime
View raw message