lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-868) Making Term Vectors more accessible
Date Thu, 19 Jul 2007 20:40:06 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513991
] 

Grant Ingersoll commented on LUCENE-868:
----------------------------------------

The TermVectorOffsetInfo and Position arrays are only created if storeOffsets and storePositions
are turned on.  But, we could also add mapperMethods like:
boolean isIgnoringOffsets()
and
boolean isIgnoringPositions()

Then, in TermVectorsReader, it could become:

if (storePositions && mapper.isIgnoringPositions() == false)

and likewise for isIgnoringOffsets.  This way a mapper could express whether it wants these
arrays to be constructed even if they are turned on.  Then we just need to skip ahead by the
appropriate amount.


> Making Term Vectors more accessible
> -----------------------------------
>
>                 Key: LUCENE-868
>                 URL: https://issues.apache.org/jira/browse/LUCENE-868
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-868-v2.patch, LUCENE-868-v3.patch
>
>
> One of the big issues with term vector usage is that the information is loaded into parallel
arrays as it is loaded, which are then often times manipulated again to use in the application
(for instance, they are sorted by frequency).
> Adding a callback mechanism that allows the vector loading to be handled by the application
would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, TermVectorMapper
mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The existing
getTermFreqVectors will be reimplemented to use an implementation of TermVectorMapper that
creates the parallel arrays.  Additionally, some simple implementations that automatically
sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have a patch
soon.
> See http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message