hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3732) New configuration option for client-side compression
Date Wed, 01 Jun 2011 20:16:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042397#comment-13042397

Karthick Sankarachary commented on HBASE-3732:

bq. Just to say that the notion of adding a compressed flag to KV is pretty invasive with
ripples across the code base. Messy is how we know what codec to used undoing the value. This
info will not be in the KV.

I agree. In fact, the {{Type}} flag in the KV does not even get persisted in the {{HFile}},
IIUC. Given that, our best bet might be to prepend a "magic number" in the value to indicate
that it is compressed. In this case, the onus would lie on the put (get) operation to compress
(decompress) the value, as J-D proposed initially. As far as the server is concerned, the
value will remain an opaque byte array.

The motivation behind the magic number is to be able to determine whether or not the value
being read needs to be decompressed. Note that most codecs (including GZIP and LZO) prefix
the compressed stream with some sort of a magic number. However, instead of relying on the
algorithm-specific number, it might be more convenient to introduce a magic number of our

bq. That would make sense, or it could be in the HCD.

I like the idea of using the HCD, considering that we want all clients to be on the same page,
as far as compressing values goes.

Does the above approach sound reasonable? If so, may I take a stab at it?

> New configuration option for client-side compression
> ----------------------------------------------------
>                 Key: HBASE-3732
>                 URL: https://issues.apache.org/jira/browse/HBASE-3732
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>         Attachments: compressed_streams.jar
> We have a case here where we have to store very fat cells (arrays of integers) which
can amount into the hundreds of KBs that we need to read often, concurrently, and possibly
keep in cache. Compressing the values on the client using java.util.zip's Deflater before
sending them to HBase proved to be in our case almost an order of magnitude faster.
> There reasons are evident: less data sent to hbase, memstore contains compressed data,
block cache contains compressed data too, etc.
> I was thinking that it might be something useful to add to a family schema, so that Put/Result
do the conversion for you. The actual compression algo should also be configurable.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message