Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 90994 invoked from network); 24 Mar 2011 17:53:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Mar 2011 17:53:43 -0000 Received: (qmail 46299 invoked by uid 500); 24 Mar 2011 17:53:43 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 46269 invoked by uid 500); 24 Mar 2011 17:53:43 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 46262 invoked by uid 99); 24 Mar 2011 17:53:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 17:53:43 +0000 X-ASF-Spam-Status: No, hits=-1999.7 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD,URIBL_RHS_DOB X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 17:53:42 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CED054C79A for ; Thu, 24 Mar 2011 17:53:05 +0000 (UTC) Date: Thu, 24 Mar 2011 17:53:05 +0000 (UTC) From: "Jukka Zitting (JIRA)" To: dev@jackrabbit.apache.org Message-ID: <127090111.8923.1300989185843.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <322958538.4888.1296647369923.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (JCR-2873) Add a way to locate full text extraction problems MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/JCR-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010800#comment-13010800 ] Jukka Zitting commented on JCR-2873: ------------------------------------ Yes, to the search index such documents look like simple text documents that contain just the string "TextExtractionError". You can query for that token and include any other constraints (path, etc.) just like when searching for normal documents. PS. In revision 1085050 I excluded extraction errors caused by linkage problems from being reported. They are caused by required extraction libraries not being present, which is a configuration/deployment choice instead of any inherent problems with the documents being parsed. > Add a way to locate full text extraction problems > ------------------------------------------------- > > Key: JCR-2873 > URL: https://issues.apache.org/jira/browse/JCR-2873 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: indexing, jackrabbit-core > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 2.3.0 > > > Full text indexing of a binary document can fail for various reasons. Currently we just log a generic error message in such cases, which makes it difficult for the user to locate such problems for review and reindexing. We should improve this by making the logs more informative or by adding some other mechanism for locating troublesome documents. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira