jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (JCR-2576) DbInputStream does not support mark()/reset() when exhausted.
Date Tue, 13 Apr 2010 16:24:53 GMT

     [ https://issues.apache.org/jira/browse/JCR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting updated JCR-2576:
-------------------------------

    Fix Version/s: 2.1.0
                       (was: 2.0.1)

> DbInputStream does not support mark()/reset() when exhausted.
> -------------------------------------------------------------
>
>                 Key: JCR-2576
>                 URL: https://issues.apache.org/jira/browse/JCR-2576
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 2.0.0
>            Reporter: Julian Sedding
>            Assignee: Thomas Mueller
>             Fix For: 2.1.0
>
>         Attachments: DbInputStream.patch
>
>
> The DbDataStore implementation uses a DbInputStream to read binary properties from the
database. When a new binary property is created, Jackrabbit attempts to index it. Tika's CharsetDetector
is used in the process, which marks the input stream, reads the first 8000 bytes and then
resets the stream.
> This results in the stacktrace shown at the end of the issue, if the following two conditions
hold true:
> * the property is larger than the minRecordLength configuration of the Datastore and
> * the property is smaller than 8000 bytes
> The DbInputStream needs to have the following properties:
> 1. lazy instantiation of the underlying stream
> 2. auto-close underlying stream when EOF is reached
> 3. fully support mark()/reset() even if  the underlying stream is auto-closed due to
2.
> 12.03.2010 15:53:28 *WARN * LazyTextExtractorField: Failed to extract text from a binary
property (LazyTextExtractorField.java, line 165)
> java.io.EOFException
>         at org.apache.jackrabbit.core.data.db.DbInputStream.reset(DbInputStream.java:180)
>         at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
>         at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156)
>         at org.apache.tika.parser.txt.CharsetDetector.setText(CharsetDetector.java:131)
>         at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:77)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
>         at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message