Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D7EF10907 for ; Fri, 13 Jun 2014 06:06:03 +0000 (UTC) Received: (qmail 44760 invoked by uid 500); 13 Jun 2014 06:06:03 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 44695 invoked by uid 500); 13 Jun 2014 06:06:02 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 44684 invoked by uid 99); 13 Jun 2014 06:06:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Jun 2014 06:06:02 +0000 Date: Fri, 13 Jun 2014 06:06:02 +0000 (UTC) From: "Jonathan Hsieh (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11339) HBase LOB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030285#comment-14030285 ] Jonathan Hsieh commented on HBASE-11339: ---------------------------------------- Nice doc. I did a quick read and have some design level questions and concerns: The core problem we are trying to avoid is write amplification (writing the data in the hlog, then in flush and then over and over again with compactions). Does the proposed design write out LOBs to both the HLog and then later LOB files? As designed, it must write them to the log so that we preserve durability and consistency properties of a row. + good that this should just would work with replication - in the best case, the data is written at least twice -- once before the ack is sent to the client and then again on flush. Can we limit this to once? We could avoid extra writes by just writing to a separate LOB log/file. Was this considered? Is there any consideration of locality and performance? 5MB cells are large but aren't really that big. Maybe this should just be "blobs" (binary large objects) or "mobs" (medium objects)? the objects being immutable is important too. > HBase LOB > --------- > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: New Feature > Components: regionserver, Scanners > Reporter: Jingcheng Du > Assignee: Jingcheng Du > Attachments: HBase LOB Design.pdf > > > It's quite useful to save the massive binary data like images, documents into Apache HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse performance since the frequent split and compaction. > In this design, the LOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)