From java-dev-return-20588-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Tue Jul 10 20:33:18 2007 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 82411 invoked from network); 10 Jul 2007 20:33:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jul 2007 20:33:16 -0000 Received: (qmail 26956 invoked by uid 500); 10 Jul 2007 20:33:16 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 26899 invoked by uid 500); 10 Jul 2007 20:33:16 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 26888 invoked by uid 99); 10 Jul 2007 20:33:15 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 13:33:15 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of grant.ingersoll@gmail.com designates 209.85.132.245 as permitted sender) Received: from [209.85.132.245] (HELO an-out-0708.google.com) (209.85.132.245) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2007 13:33:12 -0700 Received: by an-out-0708.google.com with SMTP id c5so349508anc for ; Tue, 10 Jul 2007 13:32:50 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=kdNIpTNwY4NVF8LTcb0MxZTKol4gHGwURahgj21BgKDzcZVM2PZJpQ1tE05yqeDp1rHKeZmAM346S04SDUOGU5VNr6TVxHR6wlu2uW/jsFXLMejn6/R+2Aw2UuIlaY4phEfdrf5d6HDtHYVxGajyWUW96K29Lb4jx43Z5yMNtkQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=WNxZ/+gY01/Fu3vfmkJZh7pmWRHteYBjPpeJlgmyHaYI/tHtWD+YkcrDk10dgpQplJDfcHit+lAoXmsOLdwGt4RbhsIzYZwiOfaSKpa93/5tHrqMQQSCNenO1EoTuJbRw6YfXXU8B7MsL+vmV2V0/fF6jGo1D6xOecPPS7MCmf0= Received: by 10.100.48.7 with SMTP id v7mr2410102anv.1184099570676; Tue, 10 Jul 2007 13:32:50 -0700 (PDT) Received: from ?192.168.0.3? ( [74.229.189.244]) by mx.google.com with ESMTP id c1sm1612032ana.2007.07.10.13.32.49 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jul 2007 13:32:49 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <13180055.1184075104816.JavaMail.jira@brutus> References: <13180055.1184075104816.JavaMail.jira@brutus> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: [jira] Commented: (LUCENE-868) Making Term Vectors more accessible Date: Tue, 10 Jul 2007 16:32:43 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org OK, I can wait On Jul 10, 2007, at 9:45 AM, Karl Wettin (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/LUCENE-868? > page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel#action_12511442 ] > > Karl Wettin commented on LUCENE-868: > ------------------------------------ > > Grant Ingersoll - [09/Jul/07 02:05 PM ] >> Anyone have any comments on this approach for Term Vectors? >> >> I'm not sure if the patch still applies to trunk, but I will >> update it >> and commit on Wednesday or Thursday unless I hear other comments. > > I can give the code an overview in the weekend if you want. I'll > defintely be using this stuff when I get back from vacation. > > >> Making Term Vectors more accessible >> ----------------------------------- >> >> Key: LUCENE-868 >> URL: https://issues.apache.org/jira/browse/LUCENE-868 >> Project: Lucene - Java >> Issue Type: New Feature >> Components: Store >> Reporter: Grant Ingersoll >> Assignee: Grant Ingersoll >> Priority: Minor >> Attachments: LUCENE-868-v1.patch >> >> >> One of the big issues with term vector usage is that the >> information is loaded into parallel arrays as it is loaded, which >> are then often times manipulated again to use in the application >> (for instance, they are sorted by frequency). >> Adding a callback mechanism that allows the vector loading to be >> handled by the application would make this a lot more efficient. >> I propose to add to IndexReader: >> abstract public void getTermFreqVector(int docNumber, String >> field, TermVectorMapper mapper) throws IOException; >> and a similar one for the all fields version >> Where TermVectorMapper is an interface with a single method: >> void map(String term, int frequency, int offset, int position); >> The TermVectorReader will be modified to just call the >> TermVectorMapper. The existing getTermFreqVectors will be >> reimplemented to use an implementation of TermVectorMapper that >> creates the parallel arrays. Additionally, some simple >> implementations that automatically sort vectors will also be created. >> This is my first draft of this API and is subject to change. I >> hope to have a patch soon. >> See http://www.gossamer-threads.com/lists/lucene/java-user/48003? >> search_string=get%20the%20total%20term%20frequency;#48003 for >> related information. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > ------------------------------------------------------ Grant Ingersoll http://www.grantingersoll.com/ http://lucene.grantingersoll.com http://www.paperoftheweek.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org