I think you should do something with the read result in that microbenchmark
(a sum is good), otherwise there is a risk of optimizing away most of that
code.
On May 22, 2012 6:18 PM, "Adrien Grand (JIRA)" <jira@apache.org> wrote:
>
> [
> https://issues.apache.org/jira/browse/LUCENE4062?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel]
>
> Adrien Grand updated LUCENE4062:
> 
>
> Attachment: LUCENE4062.patch
>
> Oh right, cases are swapped, I uploaded the wrong patch. Here is a correct
> version.
>
> I guess there are two options:
> 1. always writing the most compact form to disk and deserializing it in a
> fast {{PackedInts.Mutable}} implementation (depending on
> acceptableOverheadRatio),
> 2. writing a possibly aligned version on disk (depending on
> acceptableOverheadRatio) and then selecting the fastest PackedInts.Mutable
> implementation that exactly matches bitsPerValue (with no padding bits).
>
> A third option could be to write padding bits ({{Packed64SingleBlock}}
> subclasses may have such padding bits) as well, but I really dislike the
> fact that the ondisk format is implementationdependent.
>
> Option 1 is likely to make deserialization slower since some decoding
> might occur (the copyData method) but on the other hand, option 2 would
> prevent us from using implementations that add padding bits (such as
> Packed64SingleBlock21 which has one padding bit every 3 21bits integers or
> Packed64SingleBlock12 which has 4 padding bits every 5 12bits integers but
> not Packed64SingleBlock{1,2,4} since 64%{1,2,4}=0).
>
> I initially chose option 1 because I think it is nice to have
> Packed64SingleBlock21 when bitsPerValue is close to 21, since it might be
> significantly faster than Packed64.
>
> In order to know how slower deserialization is (option 1), I ran a
> microbenchmark which:
>  loads a PackedInts.Reader from a ByteArrayDataInput,
>  performs n random reads on the reader.
>
> Here is the code for the microbenchmark in case you would like to run it.
> {code}
> int valueCount = 10000000;
> int bitsPerValue = 21;
> int[] offsets = new int[valueCount];
> Random random = new Random();
> for (int i = 0; i < valueCount; ++i) {
> offsets[i] = random.nextInt(valueCount);
> }
> byte[] bytes = new byte[valueCount * 4];
> DataOutput out = new ByteArrayDataOutput(bytes);
> PackedInts.Writer writer = PackedInts.getWriter(out, valueCount,
> bitsPerValue);
> for (int i = 0; i < valueCount; ++i) {
> writer.add(random.nextInt(1 << bitsPerValue));
> }
> writer.finish();
> for (int i = 0; i < 50; ++i) {
> long start = System.nanoTime();
> DataInput in = new ByteArrayDataInput(bytes);
> PackedInts.Reader reader = PackedInts.getReader(in, 0f); // Packed64
> // PackedInts.Reader reader = PackedInts.getReader(in, 0.1f); //
> Packed64SingleBlock
> for (int j = 0, n = valueCount; j < n; ++j) {
> reader.get(offsets[j]);
> }
> long end = System.nanoTime();
> System.out.println(end  start);
> }
> {code}
>
> I ran this microbenchmark for bitsPerValue=21 and valueCount in (1 000
> 000, 10 000 000). The loading time (n = 0) is 2x to 3x slower with
> Packed64SingleBlock21. However, as soon as you perform valueCount/4 or more
> random reads (n >= valueCount/4), the total time is better for
> Packed64SingleBlock21. When n=valueCount, the total time is even 2x better
> for Packed64SingleBlock21.
>
> The loading overhead doesn't seem too bad.
>
> However I guess that some people might still be more interested in the
> loading time than in the query time (for writeintensive applications), but
> in this case they could still request a reader in its compact form (Packed
> 64 or Direct* when bitsPerValue is 8, 16, 32 or 64). If they want to be
> sure to have a fast reader too, they could also make sure that they used 8,
> 16, 32 or 64 as the bitsPerValue parameter of {{PackedInts.getWriter}}.
>
> Mike, what do you think?
>
> > More finegrained control over the packed integer implementation that is
> chosen
> >
> 
> >
> > Key: LUCENE4062
> > URL: https://issues.apache.org/jira/browse/LUCENE4062
> > Project: Lucene  Java
> > Issue Type: Improvement
> > Components: core/other
> > Reporter: Adrien Grand
> > Assignee: Michael McCandless
> > Priority: Minor
> > Labels: performance
> > Fix For: 4.1
> >
> > Attachments: LUCENE4062.patch, LUCENE4062.patch,
> LUCENE4062.patch, LUCENE4062.patch, LUCENE4062.patch, LUCENE4062.patch
> >
> >
> > In order to save space, Lucene has two main PackedInts.Mutable
> implentations, one that is very fast and is based on a
> byte/short/integer/long array (Direct*) and another one which packs bits in
> a memoryefficient manner (Packed*).
> > The packed implementation tends to be much slower than the direct one,
> which discourages some Lucene components to use it. On the other hand, if
> you store 21 bits integers in a Direct32, this is a space loss of
> (3221)/32=35%.
> > If you accept to trade some space for speed, you could store 3 of these
> 21 bits integers in a long, resulting in an overhead of 1/3 bit per value.
> One advantage of this approach is that you never need to read more than one
> block to read or write a value, so this can be significantly faster than
> Packed32 and Packed64 which always need to read/write two blocks in order
> to avoid costly branches.
> > I ran some tests, and for 10000000 21 bits values, this implementation
> takes less than 2% more space and has 44% faster writes and 30% faster
> reads. The 12 bits version (5 values per block) has the same performance
> improvement and a 6% memory overhead compared to the packed implementation.
> > In order to select the best implementation for a given integer size, I
> wrote the {{PackedInts.getMutable(valueCount, bitsPerValue,
> acceptableOverheadPerValue)}} method. This method select the fastest
> implementation that has less than {{acceptableOverheadPerValue}} wasted
> bits per value. For example, if you accept an overhead of 20%
> ({{acceptableOverheadPerValue = 0.2f * bitsPerValue}}), which is pretty
> reasonable, here is what implementations would be selected:
> > * 1: Packed64SingleBlock1
> > * 2: Packed64SingleBlock2
> > * 3: Packed64SingleBlock3
> > * 4: Packed64SingleBlock4
> > * 5: Packed64SingleBlock5
> > * 6: Packed64SingleBlock6
> > * 7: Direct8
> > * 8: Direct8
> > * 9: Packed64SingleBlock9
> > * 10: Packed64SingleBlock10
> > * 11: Packed64SingleBlock12
> > * 12: Packed64SingleBlock12
> > * 13: Packed64
> > * 14: Direct16
> > * 15: Direct16
> > * 16: Direct16
> > * 17: Packed64
> > * 18: Packed64SingleBlock21
> > * 19: Packed64SingleBlock21
> > * 20: Packed64SingleBlock21
> > * 21: Packed64SingleBlock21
> > * 22: Packed64
> > * 23: Packed64
> > * 24: Packed64
> > * 25: Packed64
> > * 26: Packed64
> > * 27: Direct32
> > * 28: Direct32
> > * 29: Direct32
> > * 30: Direct32
> > * 31: Direct32
> > * 32: Direct32
> > * 33: Packed64
> > * 34: Packed64
> > * 35: Packed64
> > * 36: Packed64
> > * 37: Packed64
> > * 38: Packed64
> > * 39: Packed64
> > * 40: Packed64
> > * 41: Packed64
> > * 42: Packed64
> > * 43: Packed64
> > * 44: Packed64
> > * 45: Packed64
> > * 46: Packed64
> > * 47: Packed64
> > * 48: Packed64
> > * 49: Packed64
> > * 50: Packed64
> > * 51: Packed64
> > * 52: Packed64
> > * 53: Packed64
> > * 54: Direct64
> > * 55: Direct64
> > * 56: Direct64
> > * 57: Direct64
> > * 58: Direct64
> > * 59: Direct64
> > * 60: Direct64
> > * 61: Direct64
> > * 62: Direct64
> > Under 32 bits per value, only 13, 17 and 2226 bits per value would
> still choose the slower Packed64 implementation. Allowing a 50% overhead
> would prevent the packed implementation to be selected for bits per value
> under 32. Allowing an overhead of 32 bits per value would make sure that a
> Direct* implementation is always selected.
> > Next steps would be to:
> > * make lucene components use this {{getMutable}} method and let users
> decide what tradeoff better suits them,
> > * write a Packed32SingleBlock implementation if necessary (I didn't do
> it because I have no 32bits computer to test the performance improvements).
> > I think this would allow more finegrained control over the speed/space
> tradeoff, what do you think?
>
> 
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> 
> To unsubscribe, email: devunsubscribe@lucene.apache.org
> For additional commands, email: devhelp@lucene.apache.org
>
>
