hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5778) Turn on WAL compression by default
Date Thu, 08 Nov 2012 19:56:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493453#comment-13493453
] 

stack commented on HBASE-5778:
------------------------------

Adding compression context to the general HLog Interface seems incorrect to me.  This kinda
of thing will not make sense for all implementations of HLog.   We are going against the effort
which tries to turn HLog into an Interface with this patch as is.

Ditto on ReplicationSource having to know anything about HLog compression, carrying compression
context (This seems 'off' having to do this in ReplicationSource --> +import org.apache.hadoop.hbase.regionserver.wal.CompressionContext;).
 What happens if HLog has a different kind of compression than our current type?  All will
break?

This seems wrong having to do this over in ReplicationSource:

{code}
+        // If we're compressing logs and the oldest recovered log's last position is greater
+        // than 0, we need to rebuild the dictionary up to that point without replicating
+        // the edits again. The rebuilding part is simply done by reading the log.
{code}

Why can't the internal implementation do the skipping if dictionary is empty and we are at
an offset > 0?

Rather than passing compression context to SequenceFileLogReader, can we not have a CompressedSequenceLogReader
and internally it manages compression contexts not letting them outside of CSLR?

                
> Turn on WAL compression by default
> ----------------------------------
>
>                 Key: HBASE-5778
>                 URL: https://issues.apache.org/jira/browse/HBASE-5778
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.96.0
>
>         Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, HBASE-5778-0.94-v2.patch,
HBASE-5778.patch
>
>
> I ran some tests to verify if WAL compression should be turned on by default.
> For a use case where it's not very useful (values two order of magnitude bigger than
the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS
130% when not compressing the WAL).
> When values are smaller than the keys, I saw a 38% improvement for the insert run time
and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts
for all the additional CPU usage, it might just be that we're able to insert faster and we
spend more time in the MemStore per second (because our MemStores are bad when they contain
tens of thousands of values).
> Those are two extremes, but it shows that for the price of some CPU we can save a lot.
My machines have 2 quads with HT, so I still had a lot of idle CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message