lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pablo Castellanos (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2482) Index sorter
Date Thu, 02 Feb 2012 20:46:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199201#comment-13199201
] 

Pablo Castellanos commented on LUCENE-2482:
-------------------------------------------

Hi, I wanted to implement some early termination strategies over my Lucene index so I started
playing with the 4.0 patch as I need to reorder it.

So I have found that a lot of functions have changed in the past year and I had to go for
some modifications, mainly:

{code}
/*@Override
public TermFreqVector[] getTermFreqVectors(int docNumber)
        throws IOException {
  return super.getTermFreqVectors(newToOld[docNumber]);
}*/

@Override
public Fields getTermVectors(int docID) throws IOException {
return super.getTermVectors(newToOld[docID]);
}

/*@Override
public Document document(int n, FieldSelector fieldSelector)
        throws CorruptIndexException, IOException {
  return super.document(newToOld[n], fieldSelector);
}*/

@Override
public void document(int docID, StoredFieldVisitor visitor)
throws CorruptIndexException, IOException {
super.document(newToOld[docID], visitor);
}
{code}

There exists also a getDeletedDocs function and I haven't found any good replacement for it

{code}
    /*@Override
    public Bits getDeletedDocs() {
      final Bits deletedDocs = super.getDeletedDocs();

      if (deletedDocs == null)
        return null;

      return new Bits() {
        @Override
        public boolean get(int index) {
          return deletedDocs.get(newToOld[index]);
        }

        @Override
        public int length() {
          return deletedDocs.length();
        }
      };
    }*/
{code}

After applying these changes and using the code against my lucene index I get some weird results.
It seems that the new sorting has worked but the posting list that access to the documents
is still pointing to the old data.

Imagine that I have 2 documents in my index and that I want to sort them by price (So the
most expensive item should have a lower docId)

Document 1
{panel}docId:1, name: iPod, price: 100${panel}

Document 2
{panel}docId:2, name: iPhone, price: 300${panel}

I run my modified version of IndexSorter over it and after that I try to query the new index,
so if I query for _name:iPhone_ I get:
{panel}docId:2, name: iPod, price: 100${panel}

That leads me to believe that the documents have been sorted but the new index is using the
old posting list. 

So I have two questions, are you planning on updating this code for newer versions of Lucene
4.0 or am I on my own to get it to work? And if this is the case, where should I look for
getting a solution for my problem?

Thanks in advance for your help.
                
> Index sorter
> ------------
>
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/other
>    Affects Versions: 3.1, 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-2482-4.0.patch, indexSorter.patch
>
>
> A tool to sort index according to a float document weight. Documents with high weight
are given low document numbers, which means that they will be first evaluated. When using
a strategy of "early termination" of queries (see TimeLimitedCollector) such sorting significantly
improves the quality of partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as document
weights - thus the ordering was limited by the limited resolution of norms. This is a pure
Lucene version of the tool, and it uses arbitrary floats from a specified stored field).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message