lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lokesh Bajaj <lokesh_ba...@yahoo.com>
Subject "docMap" array in SegmentMergeInfo
Date Wed, 13 Jul 2005 18:42:57 GMT

I noticed the following code that builds the "docMap" array in SegmentMergeInfo.java for the
case where some documents might be deleted from an index:

    // build array which maps document numbers around deletions 
    if (reader.hasDeletions()) {
      int maxDoc = reader.maxDoc();
      docMap = new int[maxDoc];
      int j = 0;
      for (int i = 0; i < maxDoc; i++) {
        if (reader.isDeleted(i))
          docMap[i] = -1;
        else
          docMap[i] = j++;
      }
    }
  }

For a very large index where we might want to delete/replace some documents, this would require
a lot of memory (for 100 million documents, this would need 381 MB of memory). Is there any
reason why this was implemented this way?

It seems like this could be implemented as a much smaller array that only keeps track of the
deleted document numbers and it would still be very efficient to calculate the new document
number by using this much smaller array. Has this been done by anyone else or been considered
for change in the Lucene code?

Lokesh


Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message