hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase LOB
Date Wed, 18 Jun 2014 09:02:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035448#comment-14035448

Jingcheng Du commented on HBASE-11339:

bq. I'm not convinced. The idea I'm suggesting is having a special lob log file that is written
once at write time that is essentially the lob store files in the doc, and put a reference
to it (file name, and offset) in the normal wal. This allows the lob to only be written once.
I don't see how this would be less efficient than an approach that must write the values out
at least twice.
  In this way, we save the Lob files as SequenceFiles, and save the offset and file name back
into the put before putting the KV into the MemStore, right?
 1. If so, we don't use the MemStore to save the Lob data, right? Then how to read the Lob
data that are not sync yet(which are still in the writer buffer)?
 2. We need add a preSync and preAppend to the HLog so that we could sync the Lob files before
the HLogs are sync.
 3. In order to the get the correct offset, we have synchronized the prePut in the coprocessor,
or we could use different Lob files for each thread?

bq. I agree about the hdfs small files problem but I think we need to properly define what
a LOB is and the scope of this effort. (hence my suggestion of Medium Objects – MOBS).

bq. I'm under the impression we are solving the latter case here. Is that correct?
That's right.

> HBase LOB
> ---------
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>   It's quite useful to save the massive binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.

This message was sent by Atlassian JIRA

View raw message