Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA472F93C for ; Thu, 13 Nov 2014 09:30:47 +0000 (UTC) Received: (qmail 80390 invoked by uid 500); 13 Nov 2014 09:30:46 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 80319 invoked by uid 500); 13 Nov 2014 09:30:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80307 invoked by uid 99); 13 Nov 2014 09:30:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 09:30:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jianshi.huang@gmail.com designates 209.85.215.47 as permitted sender) Received: from [209.85.215.47] (HELO mail-la0-f47.google.com) (209.85.215.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 09:30:19 +0000 Received: by mail-la0-f47.google.com with SMTP id gd6so12591608lab.6 for ; Thu, 13 Nov 2014 01:28:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=TEDYp3uE10MOBrjD0qC2e4fuoOG5B5nbu2JC30oHan4=; b=NHEVPWk0lcbW1DotXrd3MymiMMKlPj+CytFdAvrz/GL4sfXZCp8zG9fXQHg5I0y2nk 8MgMOkS/Mowbv6tix+oRFJpa0xq6DuhiBPOuZMSJn1/ZbR7ephaRx5CarvwqdQG7Ms6C rFobU3c0QmOSR+gt/s/pILDEqF3snGRxSK7vVtaDQub2fzCgq6gBf0bBuaT6jB3Q2blo Bdjz4KT9Po7loJguq+vKyl+OmNII+6SbRV98roMKWCrUhul0WIsLKC7hi3pXjXRKoZCS HSbqffkvx7tShyd3F5SXYraXSIlxoKr5GyPG7mtsLqz117vgbxMRmUjNtXOOKRJ4Rw3K 9CJQ== X-Received: by 10.112.239.12 with SMTP id vo12mr1180048lbc.81.1415870884047; Thu, 13 Nov 2014 01:28:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.12.200 with HTTP; Thu, 13 Nov 2014 01:27:43 -0800 (PST) In-Reply-To: References: <6AB6526B-AFA8-49F3-8CCA-64496B4E373B@gmail.com> From: Jianshi Huang Date: Thu, 13 Nov 2014 17:27:43 +0800 Message-ID: Subject: Re: Storing JSON in HBase value cell, which serialization format is most compact? To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1134849017714d0507ba21d5 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134849017714d0507ba21d5 Content-Type: text/plain; charset=UTF-8 Thanks Ram, How about Prefix Tree based encoding then? HBASE-4676 says it's also possible to do suffix tries? Then it could be a nice fit for JSON String (or any long value where changes are small). Maybe I should just flatten JSON to columns, hmm...what's the overhead for a column? Jianshi On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan < ramkrishna.s.vasudevan@gmail.com> wrote: > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF for > value > cell? > No that is not possible now. All the encoding is per KV only. > But what you say is definitely worth trying. > > >>So would you recommend storing JSON flattened as many columns? > May be yes. But I have practically not used JSON formats so I may not be > the best person to comment on this. > > Regards > Ram > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang > wrote: > > > Thanks Ram, > > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for > value > > cell? > > > > So would you recommend storing JSON flattened as many columns? > > > > Jianshi > > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan < > > ramkrishna.s.vasudevan@gmail.com> wrote: > > > > > Hi > > > > > > >> Since I'm storing > > > historical data (snapshot data) and changes between adjacent value > cells > > > are relatively small. > > > > > > If the values are changing even if it is smaller the FASTDIFF will > > rewrite > > > the value part. Only if there are exact matches then it would skip the > > > value part. JFYI. > > > > > > Regards > > > Ram > > > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang < > jianshi.huang@gmail.com > > > > > > wrote: > > > > > > > I thought FASTDIFF was only for rowkey and columns, great if it also > > > works > > > > in value cell. > > > > > > > > And thanks for the bjson link! > > > > > > > > Jianshi > > > > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu wrote: > > > > > > > > > There is FASTDIFF data block encoding. > > > > > > > > > > See also http://bjson.org/ > > > > > > > > > > Cheers > > > > > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang < > jianshi.huang@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I'm currently saving JSON in pure String format in the value cell > > and > > > > > > depends on HBase' block compression to reduce the overhead of > JSON. > > > > > > > > > > > > I'm wondering if there's a more space efficient way to store > JSON? > > > > > > (there're lots of 0s and 1s, JSON String actually is an OK > format) > > > > > > > > > > > > I want to keep the value as a Map since the schema of source data > > > might > > > > > > change over time. > > > > > > > > > > > > Also is there a DIFF based encoding for values? Since I'm storing > > > > > > historical data (snapshot data) and changes between adjacent > value > > > > cells > > > > > > are relatively small. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > -- > > > > > > Jianshi Huang > > > > > > > > > > > > LinkedIn: jianshi > > > > > > Twitter: @jshuang > > > > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > > > > > > > > > > > > > -- > > > > Jianshi Huang > > > > > > > > LinkedIn: jianshi > > > > Twitter: @jshuang > > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > > > > > > > -- > > Jianshi Huang > > > > LinkedIn: jianshi > > Twitter: @jshuang > > Github & Blog: http://huangjs.github.com/ > > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/ --001a1134849017714d0507ba21d5--