hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4608) HLog Compression
Date Tue, 27 Dec 2011 17:44:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176240#comment-13176240
] 

jiraposter@reviews.apache.org commented on HBASE-4608:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2740/#review4121
-----------------------------------------------------------


good start.

A general design thing: rather than using these static readCompressed/writeCompressed methods,
we can introduce an interface something like:
interface WALCompression {
  public int encodeKeyValue(KeyValue kv, byte[] out, int offset);
  ...
}

and then have the current non-compressed code path just be the default implementation of WALCompression
-- and add a configuration which specifies the class to use as the implementor of this interface.
We can also store the class name in the WAL metadata so that you can read compressed HLogs
even if you are writing non-compressed ones (useful for replication if one cluster uses compression
and the other doesn't, for example)


src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9296>

    no need to wrap lines



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9295>

    this should be // comments inside the function, rather than javadoc style comments above



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9299>

    we should probably use vints here - most keys and many values are <100bytes long, so
we could store the lengths in 1 byte instead of the 4 used here



src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java
<https://reviews.apache.org/r/2740/#comment9298>

    extra space



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9301>

    extra word "designed"?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9302>

    example should use arguments like "-u compressed-hlog uncompressed-hlog" rather than "filename"
twice



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9303>

    check args.length first and print help if it's not got 3 args



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9304>

    should be an 'else if' -- and have a final 'else' clause that gives usage



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9305>

    TODO: need to change this config key to match our others



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9306>

    this assumes the whole log's content fits in memory, which shouldn't be necessary... why
not loop reading one record from reader and writing one to writer?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9307>

    should have a finally { in.close(); } probably



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9308>

    should go in finally clause. Also use IOUtils.closeStream as long as "out" implements
Closeable (I think it does?)



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9309>

    why not combine this with the if/else above?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9312>

    most of this byte is wasted - we're only using 2 of the 6 bits... and I think we could
actually get rid of EMPTY as well.
    
    If we limit the dictionaries to 32k entries, then we could use the following:
    
    If bit 0 == 0: dictionary reference
      bits 1 through 15: the dictionary index
    if bit 0 == 1: new value
      start a varint encoding in this byte
    
    but let's leave this as is for now just to get the rest of the code-level issues cleaned
up



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9310>

    rather than this, why not use varints here so you don't have to specify up front what
the size is?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java
<https://reviews.apache.org/r/2740/#comment9311>

    use constant



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<https://reviews.apache.org/r/2740/#comment9313>

    since we have several methods that take all these parameters, and we might want to change
the compression scheme in the future, I think it makes sense to introduce a class WALCompressionContext
with getters for each of the dictionaries



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
<https://reviews.apache.org/r/2740/#comment9314>

    indentation



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
<https://reviews.apache.org/r/2740/#comment9315>

    again the Context object here would make things a little cleaner to integrate:
    - you can drop "compression" boolean and just check "if (compressionContext != null)"
    - you only add one integration point to the existing code instead of lots of new member
vars



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9316>

    this should be all caps -- but also probably something from the configuration



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9317>

    private final



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9318>

    LOG.isDebugEnabled -- or maybe this should even be TRACE level



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9319>

    hashCode() on a byte[] is identity-based - you should use Bytes.hashCode()



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9320>

    equals is identity based here... should use Bytes.equals()
    
    Also Bytes.equals I believe handles nulls, so you can collapse two of these three clauses
together



src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java
<https://reviews.apache.org/r/2740/#comment9321>

    I'd call this clear()



src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java
<https://reviews.apache.org/r/2740/#comment9322>

    does it have to be public?


- Todd


On 2011-12-23 06:00:24, Li Pi wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2740/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-12-23 06:00:24)
bq.  
bq.  
bq.  Review request for hbase, Eli Collins and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Heres what I have so far. Things are written, and "should work". I need to rework the
test cases to test this, and put something in the config file to enable/disable. Obviously
this isn't ready for commit at the moment, but I can get those two things done pretty quickly.
bq.  
bq.  Obviously the dictionary is incredibly simple at the moment, I'll come up with something
cooler sooner. Let me know how this looks.
bq.  
bq.  
bq.  This addresses bug HBase-4608.
bq.      https://issues.apache.org/jira/browse/HBase-4608
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/CompressedKeyValue.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/Compressor.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 24407af 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java f067221 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java d9cd6de

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java cbef70f

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/SimpleDictionary.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALDictionary.java PRE-CREATION

bq.    src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java e1117ef 
bq.    src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 59910bf 
bq.  
bq.  Diff: https://reviews.apache.org/r/2740/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Li
bq.  
bq.


                
> HLog Compression
> ----------------
>
>                 Key: HBASE-4608
>                 URL: https://issues.apache.org/jira/browse/HBASE-4608
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Li Pi
>            Assignee: Li Pi
>         Attachments: 4608v1.txt
>
>
> The current bottleneck to HBase write speed is replicating the WAL appends across different
datanodes. We can speed up this process by compressing the HLog. Current plan involves using
a dictionary to compress table name, region id, cf name, and possibly other bits of repeated
data. Also, HLog format may be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message