hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7413) Convert WAL to pb
Date Wed, 17 Apr 2013 21:01:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634435#comment-13634435

Sergey Shelukhin commented on HBASE-7413:

bq. Look in IPCUtils in the ipc package. See how it is used. We should figure out what is
tough grokking Cell and CellBlock since if it is hard for you, it is going to be really hard
for everyone else. Basically, we want to move away from KeyValue and instead use an Interface
instead. The Interface is named Cell. rpc then has a notion of passing lots of Cells together
in CellBlocks w/ the metadata on the block kept outside in an associated protobuf.
Hmm, nm, I was looking at it wrong. It seems to be straightforward. The usage of cells for
both request and response is not very obvious :)
It does seem to result in extra copy though, when you build the cellblock, which is something
we want to avoid for WAL path. Why is cell block ByteBuffer?
Currently the patch does similar thing, minus the fancy stuff adb the copy - write an optional
protobuf field with number at the end of HLogKey (number of KVs, not size, so we don't need
to serialize KVs in advance), then KVs directly to output stream.
We can add encoder and stuff there. It would require, aside from the task itself, adding a
way to build cellblock directly into output without copy (and getting count from cellscanner
in advance?), as well as redoing the way compression is currently done for HLog.Entry in general
(moving it into compressor).
Should converting WAL to cells be separate JIRA? The format would be binary compatible, or
nearly so.

bq. Is HLogKey a protobuf now (Haven't looked at patch)? If so, it is customizable? If it
is not pb, should it be?
It is pb.
bq. I am still looking for my high level outline on this project with goal, and how you are
going about it. I look at rb and it points here which has stuff distributed across multiple
comments coming and going; hard to follow.
Here's the summary of the v0 patch.

h3. Current state
The WAL currently is a Hadoop sequence file with key being HLogKey, and value WALEdit, both
writables. Hadoop sequence file contains some magic prefix, followed by metadata dictionary,
and then alternating key and value writables, prefixed by sizes.
HLogKey contains table name, encoded region name, seqId, write time, and cluster ID for replication
(only set in non-default, i.e. non original, clusters).
WALEdit contains KVs and replication scopes. The existing peculiarities in HLogKey and WALEdit
format make it hard or impossible to make changes to them.
The sequence file metadata contains the indication of whether the file has compression. Compression
uses dictionary encoding for table names, region names in HLogKey, as well as rows, families
and qualifiers in in KVs.
h3. Goals
Make WAL extensible without sacrificing too much perf.
h3. New WAL format
New WAL format is logically similar to old WAL format. It starts with 4-byte magic that allows
us to tell the old and new file apart.
That is followed by extensible PB WALHeader (file metadata), written using writeDelimited.
It currently contains the flag indicating whether the file has compression.
After that, pairs of WAL record header ("HLogKey" for compat with existing code) and WALEdit
(with KVs only), follow. Replication scopes have been moved to HLogKey from WALEdit, so the
latter only has KVs.
WAL record header is an extensible PB structure. Compression for this is supported as before
(byte arrays can contain dictionary encoding), with additional compression for replication
scope column family names (using exactly the same approach).
"WALEdit" essentially becomes just KVs. One of the fields of WAL record header is the number
of KVs that WALEdit contains. 
To avoid memory copies on KVs that are potentially large, KVs are written directly into file
without the intervening protobuf step (PB talks bytes only in the form of ByteString, which
is immutable in such way that memory copy cannot be avoided (unless KV itself were backed
by a single ByteString w/o need to serialize, which is not such a bad idea but is out of the
scope of this JIRA)).
KVs are written using a previously existing mechanism - VInt with length, followed by the
backing byte array (which technically has nothing to do with writables, it's just raw format
:)), or compressed format.
Reader reads the number of KVs indicated by WAL record header field, and assumes that these
are followed by the next WAL record header.
h3. Supporting legacy WALs.
Writing legacy WALs is no longer supported. The writer class is moved to test code. HLogKey
and WALEdit writable write methods are preserved for writing WAL in backward compat test,
and output a warning.
Reading legacy WALs is supported thru HLogFactory::createReader. This method opens the stream
and tries to read PB magic. If PB magic is present, new PB reader is returned; if it's not,
it falls back to old SequenceFile-based reader.
Both readers derive from a common class that contains some shared functionality and interface
(e.g. ::hasCompression()).
h3. Future improvement.
Ideally, given that they are used together for all practical purposes, we want to get rid
of HLogKey and WALEdit (except for backward compat-related usage) and move to the notion of
WAL record as a single thing (HLog.Entry?). However refactoring that all over the place is
out of the scope of this JIRA.

> Convert WAL to pb
> -----------------
>                 Key: HBASE-7413
>                 URL: https://issues.apache.org/jira/browse/HBASE-7413
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: stack
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>             Fix For: 0.95.1
>         Attachments: HBASE-7413-v0.patch
> From HBASE-7201

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message