hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Thu, 19 Jun 2014 06:19:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037015#comment-14037015
] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

[~jmhsieh], and [~zhihyu@ebaysf.com], thanks for the comments.
Think about the suggestion carefully, and have some ideas. Share with all of you guys, and
please kindly provide comments. According to the suggestion, I'll name the Lob as Mob from
now on.

We don't use the MemStore to save the mob data, we directly write the to the mob file and
just for once.

In the prePut of the coprocessor, the KV are split to two KVs, one(KV0) is the offset+path,
the other one(KV1) is the lob KV. KV0 is written to the HLog and MemStore, and KV1 is written
to the mob file.
Before the mob data are async to the disk, they are saved in the buffer of the mob writer,
these data are not seekable until the buffer is full or sync to the disk.
In order to avoid this, we have to sync the mob data for each put to the disk (is it ok to
sync for the mob in each put? The mob data are usually pictures, the size is around 1-5MB).

By design, each store has a single mob file for writing. We have to synchronize the operation
to increase the offset of KVs within a single mob file. So we have to have a synchronization
block(two operations in the block, one is the sync the mob data to disk, the other is to increase
the offset) in the prePut method, consequently all the puts are synchronized here. This is
not efficient. Instead we could improve it here, to use different mob files for each thread.
If so we don't need synchronization, but we will have too many open files in region server
(handler*regionNum). This is a problem.
Also we have a solution for this, we could define a SynchronousQueue with limited size so
that we could have limited open files for each region. All of these occurs in prePut, and
the prePut method should have a synchronization block in each thread. It's improved, but not
efficient IMO.

Before the MemStore flushes(do this in the preFlush of coprocessor), we roll the mob writers
and update the KV offset to 0 for new writers. This will block the prePut.

Usually by the requirements of customers, using the TTL to clean expired mob files are very
important, it's more efficient to clean the mob files than the sweep tool(mob files are hardly
updated, but have a fixed life time).
We need a way to rename the mob files before the MemStore flushes in the store flusher, and
save these mob files by date.
Such a situation probably happens: The MemStore flushing fails while the mob files renaming
succeeds. When the WALEdits are replayed, the connection between the edits and mob files are
lost. In order to avoid this, we need to add a rename-transaction znode to zk, each renaming
transaction has a znode which contains several child znodes(they're the mapping from the nameBeforeRename
to nameAfterRename). The txn znode will be deleted after every successful MemStore flushing
and all the txns for each store are exclusive to each other.

How about this?

> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message