helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kishore gopalakrishna (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-573) Add support to compress/uncompress data on ZK
Date Sun, 08 Mar 2015 23:37:38 GMT

    [ https://issues.apache.org/jira/browse/HELIX-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352348#comment-14352348

kishore gopalakrishna commented on HELIX-573:

Yeah, we still need to support it but we can go a long way without bucketing if we compress
it. We know we can support 1k partitions with raw json and no bucketing. By adding compression,
we can probably go upto 10k partitions (need to validate this) per resource without bucketing.

I plan to use GZIP to compress/uncompress. Let me know if there is something better.

This is what I am planning to do. We have common ZNRecordSerializer to serialize/deserialize
the data. We can simply check for a "enableCompression" in the simpleFields and if its true,
we apply compression. On deserializing we can check for the magic header of GZIP and if it
matches, we automatically decompress the data.

The advantage of this is we don't to change the api of ZNRecordSerializer or how it is set
in various places. When a resource is created if compression is turned on we set enableCompression=true
in the idealstate. This will take care of compressing idealstate. We now have to copy this
in creation of current state and External View. We should carry it with External View since
the controller creates it. For the CurrentState its not straightforward, since it is created
by the participants and they don't read the IdealState. We can punt on the current state hoping
that size of current state is inversely proportional to the number of nodes in the system.
And if there are large number of partitions, the number of nodes might also be large (this
is not necessarily true). The other option is to set the enableCompression=true the first
time the CurrentState ZNode is created by the participant.

> Add support to compress/uncompress data on ZK
> ---------------------------------------------
>                 Key: HELIX-573
>                 URL: https://issues.apache.org/jira/browse/HELIX-573
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: kishore gopalakrishna
>            Assignee: kishore gopalakrishna
> Currently we have bucketing as one of the options when the number of partitions are large.
We have couple of bugs with the handling of bucketized resources (one of them is fatal). 
> One of the reasons to split the znode is because we use JSON to store the data in ZNode.
While JSON is good for debugging, its space inefficient.
> A better option before going to bucketing is to support compression of Ideal state, current
state and External View. This also gives good performance.

This message was sent by Atlassian JIRA

View raw message