Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 59225 invoked from network); 14 Oct 2010 18:13:24 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Oct 2010 18:13:24 -0000 Received: (qmail 20168 invoked by uid 500); 14 Oct 2010 18:13:24 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 20149 invoked by uid 500); 14 Oct 2010 18:13:24 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 20140 invoked by uid 99); 14 Oct 2010 18:13:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 18:13:24 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of justinedelson@gmail.com designates 209.85.161.42 as permitted sender) Received: from [209.85.161.42] (HELO mail-fx0-f42.google.com) (209.85.161.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Oct 2010 18:13:16 +0000 Received: by fxm3 with SMTP id 3so1524252fxm.1 for ; Thu, 14 Oct 2010 11:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=hPgcZuqdYOHoJY/mkQk21ZjVCdrB/xNQDw0a4f8HaJI=; b=lpYVngoQ8p/FghjdJNxe3ChopOfDEGWnvXqi2SDpYoEtbOQB5jddKfD6j+WGpaUKAo ogUeypWIy5uJcTYc70KIRp9dFj+mmrX1iZ9TlKGLrhobKW4dtO2Ka2VLJj6dRENPUj6R SeaMIi/XqDNSN7+xEtGQdgzQDJqQww8hykZG4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; b=IGuYLdOQ68dDkO19EMpOsUkftLqeEupuyhT8eWQFih416UqzY+lR6aS6yQA4xz9+S8 La6Iq0RdDnXhv0JyA4TjjEIHsQiquwKWEpIQA+HNzcZY5cBoNTt9KjETQlbvxa245DqM yRwRTqtqq5b9NhSmxXH59kSX1kf3AtntN79Yk= Received: by 10.103.124.14 with SMTP id b14mr2525808mun.8.1287079974646; Thu, 14 Oct 2010 11:12:54 -0700 (PDT) Received: from Justin-Edelsons-iMac.local (ool-44c6554e.dyn.optonline.net [68.198.85.78]) by mx.google.com with ESMTPS id l14sm3450683vcr.42.2010.10.14.11.12.52 (version=SSLv3 cipher=RC4-MD5); Thu, 14 Oct 2010 11:12:53 -0700 (PDT) Message-ID: <4CB74823.8090304@gmail.com> Date: Thu, 14 Oct 2010 14:12:51 -0400 From: Justin Edelson Reply-To: justin@justinedelson.com User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: users@jackrabbit.apache.org Subject: Re: Is these any way to set up a custom Extractor in Jackrabbit 2. References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Perhaps you should vote for https://issues.apache.org/jira/browse/JCR-2642 and/or provide a patch for that issue. Justin On 10/14/10 1:14 PM, taha ben salah wrote: > Hi all, > > Is these any way please to set up a custom Extractor in Jackrabbit 2. > > What i ended up to guess is that JR2 uses Tika for all text extracting but > it does not give a way to specify textFilterClasses as in previous versions. > When looking into JackrabbitParser.java (in JR implementation), i found a > fuzzy : > new AutoDetectParser(new > TikaConfig(JackrabbitParser.class.getResourceAsStream("tika-config.xml"))) > which closes all possibilities to handle custom extractors. > further more only for backwork compatibility textFilterClasses values (in > workspace.xml) are handled for solely for "APACHE implemeted classes" and > does a simple > logger.warn("Ignoring unknown text extractor class: {}", name); > for all the rest. > > Thanks for the help. > Taha Ben Salah >