ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Ozerov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-1549) Optimize portable object fields write in non-raw mode.
Date Mon, 28 Sep 2015 07:51:04 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14910129#comment-14910129
] 

Vladimir Ozerov commented on IGNITE-1549:
-----------------------------------------

Implementation plan:
1) Switch field type ID and field length;
2) Implement field length infer when possible;
3) Implement special types for constants;
4) Implement var-length compression - need to evaluate whether we will benefit form it or
not.

> Optimize portable object fields write in non-raw mode.
> ------------------------------------------------------
>
>                 Key: IGNITE-1549
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1549
>             Project: Ignite
>          Issue Type: Task
>          Components: general
>    Affects Versions: 1.1.4
>            Reporter: Vladimir Ozerov
>            Assignee: Vladimir Ozerov
>            Priority: Blocker
>             Fix For: ignite-1.5
>
>
> Currently we write user fields as follows:
> 0 ,, 3 - field ID;
> 4 - field type;
> 5 ..8 - field len;
> 9 .. - the field itself.
> It can be optimized as follows:
> 1) Field len usually can be inferred from type. E.g., for int it is 4.
> 2) Frequently used constants can be written as separate types. E.g. INT - normal int,
INT_0 - zero, etc.
> 3) Last, but not least, values should be encoded using "variable bytes" (and possibly
ZigZag) algorithm. This will give us 2 bytes economy for ints and longs on average (I assume
here that longs are usually bigger than 4 bytes, e.g. timestamps).
> *New types will be introduced:*
> 1) Booleans: BOOL_FALSE, BOOL_TRUE;
> 2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1;
> 3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N;
> 4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N - same for
negative value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N, INT_4, INT_4N.
> 5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators to avoid
excessive calculations.
> It means that instead of 6 integer types previously, we will have 2 + 3 + 3 + 3 + 11
+ 11 = 32 types.
> To avoid excessive switches or (even worse) array/map lookups to understand what the
type is, we can divide all types space (256) into two parts: optimized and non-optimized.
Optimized space will have the MSB set to 1, and mentioned ~30 optimized types (or some of
them) are located there.
> For floats and doubles we simply infer length. 
> For primitive arrays we do not write field length and then arrya length, but only array
length.
> *Expected compaction*:
> bool: 10 -> 5 bytes (50%);
> byte: 10 -> 5-6 bytes (45%);
> short, char: 11 -> 5-7 bytes, 7 on average (35%);
> int: 13 -> 5-9 bytes, 7 on average (45%).
> long: 17 -> 5-13 bytes, 11 on average (35%).
> float: 13 -> 9 bytes (30%);
> double: 17 -> 13 bytes (25%);
> *Expected CPU overhead on writes:*
> Bool, float, double: -
> Byte, short, char: zero check, sign check;
> Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and "one" checks,
if big - sign check,
> *Expected CPU overhead on reads:*
> One additional branch between optimzied and non-optimized spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message