hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Fri, 20 Jun 2014 19:05:26 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039195#comment-14039195
] 

Jonathan Hsieh commented on HBASE-11339:
----------------------------------------

bq. Does it mean the mob files are not feasibe?

I'm trying to be convinced that we need a special mechanism to handle MOBs.  We can put the
loblog idea to rest for the time being because of the read-recently written issues.

Let's see if improving the cf flushes/compactions could achieve the same goal as the pdf.

bq. You mean directly saving the mob into HBase and using different compaction policy for
the mob cf? The compaction on the mob cf in HBase is costly, will probably delay the flushing
and block the updates. And a large mob store leads to frequent region split. All of these
impact the HBase potentially.

Yes roughly. 

With the algorithms today sure.  However, I was thinking a few things that we could use to
avoid excessive write amplification.
1) compact individual cf's without compacting others.
2) having different compaction selection/promotion algorithms per cf.
3) decided to split only based on certain cf's

Even with the pdf design, we still end up flushing fairly frequently (potentially a flush
every ~100 objects!) and we'd end up with a lot of hfiles or lob files.  

How many lob files could be generated per flush?  If I flush a table, would  all regions the
relevant regions on a particular RS go to one lob sequence file as opposed to many hfiles
in the cf case?   (e.g. similarly to how all edits on an RS go to one hlog) 

I don't think the pdf design mentions antything about caching mob values.  Would frequently
requested mob always hit hdfs?  

bq. In the current design (introduced in the pdf), if users are concerned for the write performance
rather than the consistency and replication, how about to disable the WAL directly? If users
want to enable the WAL and don't want the twice writing, they could write the mob in the client
side ( the way like Lars's suggestion). The scanner and sweep tool could work as well with
this if the locator(reference) column follows the specific format.

Interesting point but the obvious problem is we lose durability guarantees and isn't something
we can really recommend for normal use.  (in the lob log idea seems pretty obvious that we'd
be able to maintain durability guarantees).


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the MOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message