ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-1549) Optimize portable object fields write in non-raw mode.
Date Fri, 25 Sep 2015 11:21:04 GMT
Vladimir Ozerov created IGNITE-1549:
---------------------------------------

             Summary: Optimize portable object fields write in non-raw mode.
                 Key: IGNITE-1549
                 URL: https://issues.apache.org/jira/browse/IGNITE-1549
             Project: Ignite
          Issue Type: Task
          Components: general
    Affects Versions: 1.1.4
            Reporter: Vladimir Ozerov
            Priority: Blocker
             Fix For: ignite-1.5


Currently we write user fields as follows:
0 ,, 3 - field ID;
4 - field type;
5 ..8 - field len;
9 .. - the field itself.

It can be optimized as follows:
1) Field len usually can be inferred from type. E.g., for int it is 4.
2) Frequently used constants can be written as separate types. E.g. INT - normal int, INT_0
- zero, etc.
3) Last, but not least, values should be encoded using "variable bytes" (and possibly ZigZag)
algorithm. This will give us 2 bytes economy for ints and longs on average (I assume here
that longs are usually bigger than 4 bytes, e.g. timestamps).

*New types will be introduced:*
1) Booleans: BOOL_FALSE, BOOL_TRUE;
2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1;
3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N;
4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N - same for negative
value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N, INT_4, INT_4N.
5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators to avoid excessive
calculations.

It means that instead of 6 integer types previously, we will have 2 + 3 + 3 + 3 + 11 + 11
= 32 types.

To avoid excessive switches or (even worse) array/map lookups to understand what the type
is, we can divide all types space (256) into two parts: optimized and non-optimized. Optimized
space will have the MSB set to 1, and mentioned ~30 optimized types (or some of them) are
located there.

For floats and doubles we simply infer length. 

For primitive arrays we do not write field length and then arrya length, but only array length.

*Expected compaction*:
bool: 10 -> 5 bytes (50%);
byte: 10 -> 5-6 bytes (45%);
short, char: 11 -> 5-7 bytes, 7 on average (35%);
int: 13 -> 5-9 bytes, 7 on average (45%).
long: 17 -> 5-13 bytes, 11 on average (35%).
float: 13 -> 9 bytes (30%);
double: 17 -> 13 bytes (25%);

*Expected CPU overhead on writes:*
Bool, float, double: -
Byte, short, char: zero check, sign check;
Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and "one" checks,
if big - sign check,

*Expected CPU overhead on reads:*
One additional branch between optimzied and non-optimized spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message