Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 5336 invoked from network); 18 Mar 2010 11:36:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Mar 2010 11:36:50 -0000 Received: (qmail 75190 invoked by uid 500); 18 Mar 2010 11:36:49 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 75139 invoked by uid 500); 18 Mar 2010 11:36:49 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 75132 invoked by uid 99); 18 Mar 2010 11:36:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 11:36:48 +0000 X-ASF-Spam-Status: No, hits=-1054.3 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Mar 2010 11:36:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3E69F234C4AB for ; Thu, 18 Mar 2010 11:36:27 +0000 (UTC) Message-ID: <1900617951.339761268912187254.JavaMail.jira@brutus.apache.org> Date: Thu, 18 Mar 2010 11:36:27 +0000 (UTC) From: "Thomas Mueller (JIRA)" To: dev@jackrabbit.apache.org Subject: [jira] Commented: (JCR-2576) DbInputStream does not support mark()/reset() when exhausted. In-Reply-To: <1124092361.336611268902347220.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/JCR-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846855#action_12846855 ] Thomas Mueller commented on JCR-2576: ------------------------------------- Thanks a lot for the patch! I think the only remaining issue is that closeOriginalStream() should not set originalStream to null. However I would like to simplify things a bit by implementing the mark()/reset() features a different layer (use BufferedInputStream if possible). A similar issue exists with TempFileInputStream by the way. > DbInputStream does not support mark()/reset() when exhausted. > ------------------------------------------------------------- > > Key: JCR-2576 > URL: https://issues.apache.org/jira/browse/JCR-2576 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-core > Affects Versions: 2.0.0 > Reporter: Julian Sedding > Assignee: Thomas Mueller > Attachments: DbInputStream.patch > > > The DbDataStore implementation uses a DbInputStream to read binary properties from the database. When a new binary property is created, Jackrabbit attempts to index it. Tika's CharsetDetector is used in the process, which marks the input stream, reads the first 8000 bytes and then resets the stream. > This results in the stacktrace shown at the end of the issue, if the following two conditions hold true: > * the property is larger than the minRecordLength configuration of the Datastore and > * the property is smaller than 8000 bytes > The DbInputStream needs to have the following properties: > 1. lazy instantiation of the underlying stream > 2. auto-close underlying stream when EOF is reached > 3. fully support mark()/reset() even if the underlying stream is auto-closed due to 2. > 12.03.2010 15:53:28 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 165) > java.io.EOFException > at org.apache.jackrabbit.core.data.db.DbInputStream.reset(DbInputStream.java:180) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156) > at org.apache.tika.io.ProxyInputStream.reset(ProxyInputStream.java:156) > at org.apache.tika.parser.txt.CharsetDetector.setText(CharsetDetector.java:131) > at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:77) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114) > at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.