hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase LOB
Date Mon, 16 Jun 2014 09:23:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032254#comment-14032254
] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

Thanks [~lhofhansl] for the comments.

> Is it better to store small blobs (let's say 1mb or less) in HBase (by value) and larger
blob directly in files in HDFS with just a reference in HBase? Writing large blobs would be
a three step process: (1) add the metadata to HBase (2) stream the actual blob into HDFS (3)
set a "written" column in the HBase row to true.
Good idea. But In this way, all the actions occurs in the client, each client writes a new
file in HDFS. It's hard to control the file size which consequently leads to too many small
files in HDFS probably.

> HBase LOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message