hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase LOB
Date Fri, 13 Jun 2014 06:06:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030285#comment-14030285

Jonathan Hsieh commented on HBASE-11339:

Nice doc.  I did a quick read and have some design level questions and concerns:

The core problem we are trying to avoid is write amplification (writing the data in the hlog,
then in flush and then over and over again with compactions).

Does the proposed design write out LOBs to both the HLog and then later LOB files?  As designed,
it must write them to the log so that we preserve durability and consistency properties of
a row.
+ good that this should just would work with replication
- in the best case, the data is written at least twice -- once before the ack is sent to the
client and then again on flush.  Can we limit this to once?

We could avoid extra writes by just writing to a separate LOB log/file.  Was this considered?

Is there any consideration of locality and performance?

5MB cells are large but aren't really that big.  Maybe this should just be "blobs" (binary
large objects) or "mobs" (medium objects)?  the objects being immutable is important too.

> HBase LOB
> ---------
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>   It's quite useful to save the massive binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.

This message was sent by Atlassian JIRA

View raw message