lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-868) Making Term Vectors more accessible
Date Tue, 10 Jul 2007 20:32:43 GMT
OK, I can wait

On Jul 10, 2007, at 9:45 AM, Karl Wettin (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/LUCENE-868? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12511442 ]
>
> Karl Wettin commented on LUCENE-868:
> ------------------------------------
>
> Grant Ingersoll - [09/Jul/07 02:05 PM ]
>> Anyone have any comments on this approach for Term Vectors?
>>
>> I'm not sure if the patch still applies to trunk, but I will  
>> update it
>> and commit on Wednesday or Thursday unless I hear other comments.
>
> I can give the code an overview in the weekend if you want. I'll  
> defintely be using this stuff when I get back from vacation.
>
>
>> Making Term Vectors more accessible
>> -----------------------------------
>>
>>                 Key: LUCENE-868
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-868
>>             Project: Lucene - Java
>>          Issue Type: New Feature
>>          Components: Store
>>            Reporter: Grant Ingersoll
>>            Assignee: Grant Ingersoll
>>            Priority: Minor
>>         Attachments: LUCENE-868-v1.patch
>>
>>
>> One of the big issues with term vector usage is that the  
>> information is loaded into parallel arrays as it is loaded, which  
>> are then often times manipulated again to use in the application  
>> (for instance, they are sorted by frequency).
>> Adding a callback mechanism that allows the vector loading to be  
>> handled by the application would make this a lot more efficient.
>> I propose to add to IndexReader:
>> abstract public void getTermFreqVector(int docNumber, String  
>> field, TermVectorMapper mapper) throws IOException;
>> and a similar one for the all fields version
>> Where TermVectorMapper is an interface with a single method:
>> void map(String term, int frequency, int offset, int position);
>> The TermVectorReader will be modified to just call the  
>> TermVectorMapper.  The existing getTermFreqVectors will be  
>> reimplemented to use an implementation of TermVectorMapper that  
>> creates the parallel arrays.  Additionally, some simple  
>> implementations that automatically sort vectors will also be created.
>> This is my first draft of this API and is subject to change.  I  
>> hope to have a patch soon.
>> See http://www.gossamer-threads.com/lists/lucene/java-user/48003? 
>> search_string=get%20the%20total%20term%20frequency;#48003 for  
>> related information.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message