Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 86774 invoked from network); 19 Jul 2007 01:25:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Jul 2007 01:25:49 -0000 Received: (qmail 67138 invoked by uid 500); 19 Jul 2007 01:25:29 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 67013 invoked by uid 500); 19 Jul 2007 01:25:29 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 67000 invoked by uid 99); 19 Jul 2007 01:25:28 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2007 18:25:28 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2007 18:25:25 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1B16D7141EF for ; Wed, 18 Jul 2007 18:25:05 -0700 (PDT) Message-ID: <3205772.1184808305107.JavaMail.jira@brutus> Date: Wed, 18 Jul 2007 18:25:05 -0700 (PDT) From: "Grant Ingersoll (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-868) Making Term Vectors more accessible In-Reply-To: <17858582.1177284135291.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-868: ----------------------------------- Attachment: LUCENE-868-v3.patch Added the start of a Position based Mapper. This would allow indexing directly (almost) into the vector by position. Still needs a little more testing, but wanted to put it out there for others to see. > Making Term Vectors more accessible > ----------------------------------- > > Key: LUCENE-868 > URL: https://issues.apache.org/jira/browse/LUCENE-868 > Project: Lucene - Java > Issue Type: New Feature > Components: Store > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: LUCENE-868-v2.patch, LUCENE-868-v3.patch > > > One of the big issues with term vector usage is that the information is loaded into parallel arrays as it is loaded, which are then often times manipulated again to use in the application (for instance, they are sorted by frequency). > Adding a callback mechanism that allows the vector loading to be handled by the application would make this a lot more efficient. > I propose to add to IndexReader: > abstract public void getTermFreqVector(int docNumber, String field, TermVectorMapper mapper) throws IOException; > and a similar one for the all fields version > Where TermVectorMapper is an interface with a single method: > void map(String term, int frequency, int offset, int position); > The TermVectorReader will be modified to just call the TermVectorMapper. The existing getTermFreqVectors will be reimplemented to use an implementation of TermVectorMapper that creates the parallel arrays. Additionally, some simple implementations that automatically sort vectors will also be created. > This is my first draft of this API and is subject to change. I hope to have a patch soon. > See http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003 for related information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org