hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Mon, 23 Jun 2014 06:44:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040463#comment-14040463

Jingcheng Du commented on HBASE-11339:

Thanks [~jmhsieh] !

bq. 1) compact individual cf's without compacting others. 2)having different compaction selection/promotion
algorithms per cf.
Yes, this could improve the compaction. But this doesn't reduce the twice writing for the
mob file.

bq. 3) decided to split only based on certain cf's
We could split the region by a certain cf, but after all the cf of mob will be split. Let's
assume a metadata(description data for the mob, they're other cfs than the mob cf) is 1KB
and a mob is 5MB, when the region is split by the metadata size, the mob data will be very
very large. Saving the mob off from the HBase could avoid this. 
When scanning, the mob data is counted in the heap of scanners if saving the mob in the HBase
whereas the mob are directly sought in a single file each time if saving them into mob files(We
have a mechanism to cache several opened scanners of the mob files). The latter one seems
to be more efficient.

bq. How many lob files could be generated per flush? If I flush a table, would all regions
the relevant regions on a particular RS go to one lob sequence file as opposed to many hfiles
in the cf case? (e.g. similarly to how all edits on an RS go to one hlog)
The files related with the mob are reference(path)HFile + mobFile. The amount of the files
is doubled than the one related with mob directly saving them into HBase.
Saving the mob files by stores than by region server is more efficient to use the TTL to clean
the expired mobs.

bq. Even with the pdf design, we still end up flushing fairly frequently (potentially a flush
every ~100 objects!) and we'd end up with a lot of hfiles or lob files.
The HFiles for metadata are supposed to be small, it's not so costly as the one in mob files.
Usually the mob is much larger than the metadata, the mob files are large enough when flushing.
And because of the read against a single file, the amount of the mob files won't impact the
read performance.

bq. I don't think the pdf design mentions antything about caching mob values. Would frequently
requested mob always hit hdfs?
We have a MobCacheConfig which extends the CacheConfig for the each mob store, it provides
a cache for several opened mob files(only cache the opened reader, the capacity is limited
and , use LRU to evict them if the capacity is exceeded.), and this cache had the same global
block cache with the one in region server. If saving the mob into HFile, the block cache works
with mob files as well.

> HBase MOB
> ---------
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>   It's quite useful to save the medium binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the MOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.

This message was sent by Atlassian JIRA

View raw message