lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1990) Add unsigned packed int impls in oal.util
Date Thu, 01 Apr 2010 21:52:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852568#action_12852568
] 

Toke Eskildsen commented on LUCENE-1990:
----------------------------------------

I've located the bug and fixed it. As expected, it was in the write-masks. Unfortunately I'm
running out of time, so I cannot make a patch right now. The code for Packed64 is
{code}
  private static final long[][] WRITE_MASKS =
          new long[ENTRY_SIZE][ENTRY_SIZE * FAC_BITPOS];
  static {
    for (int elementBits = 1 ; elementBits <= BLOCK_SIZE ; elementBits++) {
        long elementPosMask = ~(~0L << elementBits);
        int[] currentShifts = SHIFTS[elementBits];
        long[] currentMasks = WRITE_MASKS[elementBits];
        for (int bitPos = 0 ; bitPos < BLOCK_SIZE ; bitPos++) {
            int base = bitPos * FAC_BITPOS;
            currentMasks[base  ] =~((elementPosMask
                               << currentShifts[base + 1])
                              >>> currentShifts[base]);
            currentMasks[base+1] =
                ~(elementPosMask << currentShifts[base + 2]);
            currentMasks[base+2] = currentShifts[base + 2] == 0 ? 0 : ~0;
          if (bitPos <= BLOCK_SIZE - elementBits) { // Second block not used
            currentMasks[base+1] = ~0; // Keep all bits
            currentMasks[base+2] = 0;  // Or with 0
          }
        }
    }
  }
{code}

The changed code is the addition of the last check for second block usage. Likewise the fix
for Packed32 is

{code}
  private static final int[][] WRITE_MASKS =
          new int[ENTRY_SIZE][ENTRY_SIZE * FAC_BITPOS];
  static {
    for (int elementBits = 1 ; elementBits <= BLOCK_SIZE ; elementBits++) {
      int elementPosMask = ~(~0 << elementBits);
      int[] currentShifts = SHIFTS[elementBits];
      int[] currentMasks = WRITE_MASKS[elementBits];
      for (int bitPos = 0 ; bitPos < BLOCK_SIZE ; bitPos++) {
        int base = bitPos * FAC_BITPOS;
        currentMasks[base  ] =~((elementPosMask
                << currentShifts[base + 1])
                >>> currentShifts[base]);
        currentMasks[base+1] = ~(elementPosMask
                << currentShifts[base + 2]);
        currentMasks[base+2] = currentShifts[base + 2] == 0 ? 0 : ~0;
        if (bitPos <= BLOCK_SIZE - elementBits) { // Second block not used
          currentMasks[base+1] = ~0; // Keep all bits
          currentMasks[base+2] = 0;  // Or with 0
        }
      }
    }
  }
{code}

Without checking thoroughly, I'd expect the two pieces of code to be exactly the same, at
the difference between Packed32 and Packed64 is just long vs. int and some constants. The
unit-test from above can be used for Packed32 by explicitly creating a Packed32 instead of
calling the factory.

I'll be back behind the screen in a few days where I can make a patch, but you are more than
welcome to roll the patch if it is more convenient to get it immediately.

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: generated_performance-te20100226.txt, LUCENE-1990-te20100122.patch,
LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch,
LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, LUCENE-1990-te20100226c.patch,
LUCENE-1990-te20100301.patch, LUCENE-1990.patch, LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip,
perf-mkm-20100227.txt, performance-20100301.txt, performance-te20100226.txt
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message