Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 89399 invoked from network); 16 Dec 2009 14:21:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Dec 2009 14:21:46 -0000 Received: (qmail 65902 invoked by uid 500); 16 Dec 2009 14:21:45 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 65867 invoked by uid 500); 16 Dec 2009 14:21:45 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 65856 invoked by uid 99); 16 Dec 2009 14:21:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 14:21:45 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ianboston@googlemail.com designates 74.125.78.25 as permitted sender) Received: from [74.125.78.25] (HELO ey-out-2122.google.com) (74.125.78.25) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Dec 2009 14:21:43 +0000 Received: by ey-out-2122.google.com with SMTP id 25so240626eya.1 for ; Wed, 16 Dec 2009 06:21:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:sender:content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=7Qpjn0dCgJenI4F4Sia8q5fGtzQA4atc1PHUyI+z0tQ=; b=LuzISFPdnLWHV6vhhACSFvJ/IRCGuQoKWY87VyyITL8ZPndyb6ACjKmJgwsfXhU/Z1 r1K/hz3GQw+5aCmisQhaLnpxG9p/zPUiM63gKhKd20fIhSyyt7OQGBlQTKVSHokUmrim AMjw+D3KqvcHoszavnVQH1q+tpNZPJ9VhG4W4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=sender:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=U090sVwTzP24FNJjAkJHzqZ7z7FhZIATg27ICKrGjVtkHj1YQqGyVKtDUHE60KjQKw PyG+KJzzUxUrG+Omri4UWi8FI/3wQh4w/ipFPno0mP+auzlHeT6K5AxGQe8CvS8Den9v vdZKrq1Tc4YqPwXeijivmXrDmhJpwVBfNBruI= Received: by 10.213.103.83 with SMTP id j19mr6853652ebo.31.1260973281720; Wed, 16 Dec 2009 06:21:21 -0800 (PST) Received: from ?192.168.1.66? (78-105-202-108.zone3.bethere.co.uk [78.105.202.108]) by mx.google.com with ESMTPS id 7sm1659805eyg.25.2009.12.16.06.21.20 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 16 Dec 2009 06:21:21 -0800 (PST) Sender: Ian Boston Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1077) Subject: Re: TermVectors from Jackrabbit Queries From: Ian Boston In-Reply-To: <510143ac0912160225s1d30760by8c30de9c1f8fdc8c@mail.gmail.com> Date: Wed, 16 Dec 2009 14:21:19 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <4AA0BC45-0EB5-4F81-831D-3BDD675ED553@tfd.co.uk> References: <9C4ED2E3-F7EB-477B-AD5C-E4F653E9EBE7@tfd.co.uk> <510143ac0912160225s1d30760by8c30de9c1f8fdc8c@mail.gmail.com> To: users@jackrabbit.apache.org X-Mailer: Apple Mail (2.1077) On 16 Dec 2009, at 10:25, Jukka Zitting wrote: > Hi, >=20 > On Tue, Dec 15, 2009 at 6:11 PM, Ian Boston wrote: >> Is there any other way of getting to the SearchIndex, so that I can = get? >> to the Lucene Document and the TermVector (other than AspectJ or = cglib) >=20 > Instead of reaching down to the underlying Lucene index, I would > recommend reading the original document data stored in the JCR node > and passing it through the Jackrabbit text extractors and the > configured Lucene Analyzer to get the terms stored in the index. That can be quite expensive, especially for poor quality PDF,s, and some = docx word docs. I am expecting to want to do this for between 25 and 100 nodes at a time = aggregating the results. Ian >=20 > BR, >=20 > Jukka Zitting