lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pablo Castellanos (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2482) Index sorter
Date Thu, 02 Feb 2012 20:46:58 GMT


Pablo Castellanos commented on LUCENE-2482:

Hi, I wanted to implement some early termination strategies over my Lucene index so I started
playing with the 4.0 patch as I need to reorder it.

So I have found that a lot of functions have changed in the past year and I had to go for
some modifications, mainly:

public TermFreqVector[] getTermFreqVectors(int docNumber)
        throws IOException {
  return super.getTermFreqVectors(newToOld[docNumber]);

public Fields getTermVectors(int docID) throws IOException {
return super.getTermVectors(newToOld[docID]);

public Document document(int n, FieldSelector fieldSelector)
        throws CorruptIndexException, IOException {
  return super.document(newToOld[n], fieldSelector);

public void document(int docID, StoredFieldVisitor visitor)
throws CorruptIndexException, IOException {
super.document(newToOld[docID], visitor);

There exists also a getDeletedDocs function and I haven't found any good replacement for it

    public Bits getDeletedDocs() {
      final Bits deletedDocs = super.getDeletedDocs();

      if (deletedDocs == null)
        return null;

      return new Bits() {
        public boolean get(int index) {
          return deletedDocs.get(newToOld[index]);

        public int length() {
          return deletedDocs.length();

After applying these changes and using the code against my lucene index I get some weird results.
It seems that the new sorting has worked but the posting list that access to the documents
is still pointing to the old data.

Imagine that I have 2 documents in my index and that I want to sort them by price (So the
most expensive item should have a lower docId)

Document 1
{panel}docId:1, name: iPod, price: 100${panel}

Document 2
{panel}docId:2, name: iPhone, price: 300${panel}

I run my modified version of IndexSorter over it and after that I try to query the new index,
so if I query for _name:iPhone_ I get:
{panel}docId:2, name: iPod, price: 100${panel}

That leads me to believe that the documents have been sorted but the new index is using the
old posting list. 

So I have two questions, are you planning on updating this code for newer versions of Lucene
4.0 or am I on my own to get it to work? And if this is the case, where should I look for
getting a solution for my problem?

Thanks in advance for your help.
> Index sorter
> ------------
>                 Key: LUCENE-2482
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/other
>    Affects Versions: 3.1, 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 3.6, 4.0
>         Attachments: LUCENE-2482-4.0.patch, indexSorter.patch
> A tool to sort index according to a float document weight. Documents with high weight
are given low document numbers, which means that they will be first evaluated. When using
a strategy of "early termination" of queries (see TimeLimitedCollector) such sorting significantly
improves the quality of partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as document
weights - thus the ordering was limited by the limited resolution of norms. This is a pure
Lucene version of the tool, and it uses arbitrary floats from a specified stored field).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message