Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 9742 invoked from network); 24 Nov 2009 16:50:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Nov 2009 16:50:51 -0000 Received: (qmail 23308 invoked by uid 500); 24 Nov 2009 16:50:50 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 23281 invoked by uid 500); 24 Nov 2009 16:50:50 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 23270 invoked by uid 99); 24 Nov 2009 16:50:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2009 16:50:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jukka.zitting@gmail.com designates 209.85.218.214 as permitted sender) Received: from [209.85.218.214] (HELO mail-bw0-f214.google.com) (209.85.218.214) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2009 16:50:42 +0000 Received: by bwz6 with SMTP id 6so6365170bwz.11 for ; Tue, 24 Nov 2009 08:50:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=NhX4cXP4ewI9FpJz23Iq8wcFPJGJjwwqULbzkDfEjC0=; b=ZNk0qnMzlQRpefEH0Q2Td4DHEJMWwKuNjKHrK1BURSEiDwIMmXD4BF3H6uLGoX8Jg+ rRGfmrjEO+bP+ampMkdZ97qWP0tLWsTpKPbYy4UK8tbtFb1w2t/AxK1hjeniDix9Lun4 +EK9QfnoY2twX/jhHgjsRJIzofO3aJCrPitXQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=hndTASLytMNwc9bvdVs6JQK58ew9DTJFDJEr24dr8HJJZhMW9LxXvoT5gk5oJHVevL lvybQeIDdSmbCJjY2nif1nawSJO53rfdqLO7zuP3XiPLiKg/QBb3oglI3WUQ7mTiEi+j nEa7DVZODRzHbk5lL6yXeiaJxNV1GNxDcQDsk= MIME-Version: 1.0 Received: by 10.204.153.22 with SMTP id i22mr1276486bkw.123.1259081422103; Tue, 24 Nov 2009 08:50:22 -0800 (PST) In-Reply-To: <8f70390911240837r6e5adbc8vea7dae30b897cac5@mail.gmail.com> References: <8f70390911240837r6e5adbc8vea7dae30b897cac5@mail.gmail.com> From: Jukka Zitting Date: Tue, 24 Nov 2009 17:50:02 +0100 Message-ID: <510143ac0911240850n7f46b2c6yf7364f0f5968751e@mail.gmail.com> Subject: Re: How can I access to the TextExtractor result? To: users@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Tue, Nov 24, 2009 at 5:37 PM, Paco Avila wrote: > I wonder if I can access the text produced by the TextExtractor from a > document file (like a PDF, for example) Jackrabbit doesn't store the extracted text anywhere, it is just used to add the document to the inverted Lucene index. You can always use the text extractor directly to get the text content. Check out http://lucene.apache.org/tika/ for more details about the Tika toolkit that we nowadays use for text extraction. BR, Jukka Zitting