hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingcheng Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Wed, 25 Jun 2014 02:57:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042990#comment-14042990
] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

Resend it to correct the format.
bq. In the pdf design, is there one MobManager per RS or one MobManager per table or one MobManager
per region? Is the mob hfiles kind of like a shared cf that all regions with mobs eventually
throw their data into?
The MobManager is per region server, it maintain the mapping between the (tableName,cfName)
to mob cf.
The mob files are saved in the <i>mobRootDir / tableNameAsString / cfName / date / mobFiles</i>.
1.  A mob file is generated per MemStore flushing.
2.  All the mob files for all regions in a single table of a region server are saved into
the same directory <i>mobRootDir / tableNameAsString / cfName /  date</i>.
The greatest advantage is using the TTL to clean the whole date directory in one cf.

bq. Can you explain what happens if I have a RS with regions, some belonging to tableA and
and some belonging to tableB. Let's say all writes to tableA and tableB have Mobs in them.
The mob files are save in the <i>mobRootDir / tableNameAsString / cfName / date / mobFiles</i>.
So each mob cf should have its own mob file, one new mob file is generated for each cf when
a region flushes.
1. The mob files for tableA and tableB are saved into different directories. The ones for
tableA are saved into <i> mobRootDir / tableAAsString / cfName / date / mobFiles</i>,
and the ones for tableB are saved into <i>mobRootDir / tableBAsString / cfName / data
/ mobFiles</i>.
2. Per flushing, a new mob file is generated for each cf, the one for tableA is <i>mobRootDir
/ tableBAsString / cf1 / data/ aNewMobFileForTableACf1</i>, the one for tableB is <i>mobRootDir
/ tableBAsString / cf2 / data / aNewMobFileForTableBCf2</i>.

bq. With this It sounds like new mob file per region, and that mobs would still generate the
same number of files as the separate cf's approach.
Can't we (or do we already) have the ttl optimization in our existing cf's since our hfiles
have start and end ts in them?
The mob files are saved by table/cf instead of table/region/cf.
If saving the mob into HBase directly, the writing when splitting the mob store are not avoided
even if we split the regions by certain cfs.
If getting the end ts by the last key in the HFile, we have to read all the HFile to know
whether it's expired. In the pdf, we check it by directories which needs less read.

> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the MOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message