phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Jira)" <>
Subject [jira] [Created] (PHOENIX-5595) Use ROW_INDEX_V1 block encoding and zSTD compression by default
Date Wed, 27 Nov 2019 16:42:00 GMT
Lars Hofhansl created PHOENIX-5595:

             Summary: Use ROW_INDEX_V1 block encoding and zSTD compression by default
                 Key: PHOENIX-5595
             Project: Phoenix
          Issue Type: Wish
            Reporter: Lars Hofhansl

Phoenix defaults to FAST_DIFF block encoding and no compression (not needed with FAST_DIFF).

I blogged about this extensively here:

We should switch the default to block encoding ROW_INDEX_V1 and compression zSTD for all newly
created tables (including global indexes). Local indexes can stay with FAST_DIFF, but perhaps
for completeness we should just switch everything.

The only wrinkle is that FAST_DIFF also does compression (i.e. the diff encoding), and ROW_INDEX_V1
actually increases the block size a little bit since it keeps in a index of row keys so that
it can do binary search inside of an HFile block. Hence it needs to be paired with compression.
Every test I did suggests that zSTD is the best.
The main wrinkle is that zSTD needs a Hadoop/HBase build with native zSTD support compiled.

I marked this as a Wish... Perhaps we can discuss here.

What I do know is that FAST_DIFF has outgrown its usefulness, seeking into FAST_DIFF is (naturally)
slow since it would need to seek to that last know fully stored key and then play all the
diffs forward from there to the actual row we want to seek to. This impacts GETs.
zSTD also offers better compression and thus reduced IO even when paired with ROW_INDEX_V1.

[~apurtell] What we discussed a while back.

This message was sent by Atlassian Jira

View raw message