hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Wed, 03 Sep 2014 10:47:53 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119741#comment-14119741

Jingcheng Du commented on HBASE-11339:

Thanks Lars for the comments. [~lhofhansl]

bq. Going by the comments the use case is only 1-5mb files (definitely less than 64mb), correct?
That changes the discussion, but it looks to me that now the use case is limited to a single
scenario and carefully constructed (200m x 500k files) so that this change might be useful.
I.e. pick a blob size just right, and pick the size distribution of the files just right and
this makes sense.
the client solution could work well too in certain cases of bigger size blobs and we could
try leveraging the current MOB design approach for smaller values of KVs.
In some usage scenarios, the value size is almost fixed, for example the pictures taken by
camera of the traffic bureau, the contracts between banks and customers, the CT(Computed Tomography)
records in hospitals, etc. This might be limited, but it’s really useful.
As mentioned the client solution saves the records larger than 10MB to hdfs, and saves others
to the HBase directly. To turn down the threshold less will lead to the insufficient using
of the hdfs in client solution, instead saving them directly in HBase for this case.
And even with value size less 10MB, the mob implementation has big improvements in performance
than directly saving those records into HBase.

The mob has a threshold as well, the mob could be saved as either value or reference by this
threshold. We have a default value 100KB for it now. Users could change it and we also have
a compactor to handle it (move the mob file to hbase, and vice versa).

As Jon said, we'll revamp the mob compaction and get rid of the MR dependency.

bq. Yet, all that is possible to do with a client only solution and could be abstracted there.
To implement the snapshot, replication things in client solution are harder, it will bring
the complexity for the client solution as well. To keep the consistency bwtween HBase and
HDFS files during replication is a problem.
To implement this in server side is a little bit easier, the mob includes the implementation
of snapshot, and it supports the replication naturally because the mob data are saved in WAL.

bq. (Subjectively) I do not like the complexity of this as seen by the various discussions
here. That part is just my $0.02 of course.
Yes, it’s complex, but they are meaningful and valuable.
The patches provide features of read/write, compactions, snapshot and sweep for mob files.
Even in the future HBase decides to implement streaming feature, the read, compaction, and
snapshot parts would be useful probably.


> HBase MOB
> ---------
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase MOB Design-v4.pdf,
HBase MOB Design.pdf, MOB user guide.docx, MOB user guide_v2.docx, hbase-11339-in-dev.patch
>   It's quite useful to save the medium binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the MOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.

This message was sent by Atlassian JIRA

View raw message