lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Created: (LUCENE-868) Making Term Vectors more accessible
Date Sun, 22 Apr 2007 23:22:15 GMT
Making Term Vectors more accessible

                 Key: LUCENE-868
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Store
            Reporter: Grant Ingersoll
         Assigned To: Grant Ingersoll
            Priority: Minor

One of the big issues with term vector usage is that the information is loaded into parallel
arrays as it is loaded, which are then often times manipulated again to use in the application
(for instance, they are sorted by frequency).

Adding a callback mechanism that allows the vector loading to be handled by the application
would make this a lot more efficient.

I propose to add to IndexReader:
abstract public void getTermFreqVector(int docNumber, String field, TermVectorMapper mapper)
throws IOException;
and a similar one for the all fields version

Where TermVectorMapper is an interface with a single method:
void map(String term, int frequency, int offset, int position);

The TermVectorReader will be modified to just call the TermVectorMapper.  The existing getTermFreqVectors
will be reimplemented to use an implementation of TermVectorMapper that creates the parallel
arrays.  Additionally, some simple implementations that automatically sort vectors will also
be created.

This is my first draft of this API and is subject to change.  I hope to have a patch soon.

for related information.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message