hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase LOB
Date Mon, 16 Jun 2014 09:16:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032246#comment-14032246
] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

Thanks [~jmhsieh] for the comments.

>Does the proposed design write out LOBs to both the HLog and then later LOB files?
Yes, the Lobs are written in both HLogs and Lob files.

>in the best case, the data is written at least twice – once before the ack is sent to
the client and then again on flush. Can we limit this to once?
>We could avoid extra writes by just writing to a separate LOB log/file. Was this considered?
It was considered. But we didn't find a good solution for this.

>Is there any consideration of locality and performance?
The locality is only retained after the Lobs are flushed from the MemStore. But it's not guaranteed
after the SweepTool runs(Lob compaction) or regions move to other regionservers.
The write/read performance of HBase is not supposed be be impacted too much, I will provide
the details later as soon as the performance testing is done.

>5MB cells are large but aren't really that big. Maybe this should just be "blobs" (binary
large objects) or "mobs" (medium objects)?  the objects being immutable is important too
Actually the Lobs could be mutable. The Lobs that are not used anymore will be handled by
the Sweep Tool.

> HBase LOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message