Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DFB5DC40 for ; Tue, 2 Oct 2012 18:25:00 +0000 (UTC) Received: (qmail 26149 invoked by uid 500); 2 Oct 2012 18:25:00 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 26115 invoked by uid 500); 2 Oct 2012 18:25:00 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 26106 invoked by uid 99); 2 Oct 2012 18:25:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 18:25:00 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 18:24:53 +0000 Received: by vbkv13 with SMTP id v13so7869860vbk.0 for ; Tue, 02 Oct 2012 11:24:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=8xWYO7A6qhGExTGacytVqW0/yYwGU+I2VurUNOLA4no=; b=h1Id1BBOO+2FxxzYd1JMeaYNNZXBG5HNO+n4niPq+Yha89gU2Cpd28M+sIYRfAOafU 94B0YWSMhtD3eekftk9FYJ3LI60pbcJS2lwuFS87Yox6P5IneGjWIE0vToz3KgHhGxu9 EQSGQ3j4v7cbp1sGzy2VLLh3JKpQPpG0GZTAhSwwGsDqU9zhzSXhXOMAcgBPbQg0V6/W t8BRUVFqmWOAjtBddA4+2xFN9uMONt+gBaWrPPlP9P9T0Ayc7aajsJbYOMLQZwV4RUTm u45FPJwaoi8e6LMKc/j07Jte3m+YkREdfB57HQBVRwwwNkkPECoPyYbSdFMXc4TQIT5h fiww== MIME-Version: 1.0 Received: by 10.59.0.41 with SMTP id av9mr11071400ved.32.1349202272551; Tue, 02 Oct 2012 11:24:32 -0700 (PDT) Received: by 10.58.74.200 with HTTP; Tue, 2 Oct 2012 11:24:32 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Oct 2012 14:24:32 -0400 Message-ID: Subject: Re: compressing values returned to scanner From: Keith Turner To: user@accumulo.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQml06uWM5TOpmj21yIDGyLK7TqaHWaDO0T00+X7+t3wIVQj0o1Lg6bl1yUneTtne0K3sgTV X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Oct 1, 2012 at 3:03 PM, ameet kini wrote: > > My understanding of compression in Accumulo 1.4.1 is that it is on by > default and that data is decompressed by the tablet server, so data on the > wire between server/client is decompressed. Is there a way to shift the > decompression from happening on the server to the client? I have a use case > where each Value in my table is relatively large (~ 8MB) and I can benefit > from compression over the wire. I don't have any server side iterators, so > the values don't need to be decompressed by the tablet server. Also, each > scan returns a few rows, so client-side decompression can be fast. > > The only way I can think of now is to disable compression on that table, and > handle compression/decompression in the application. But if there is a way > to do this in Accumulo, I'd prefer that. > There are two levels of compression in Accumulo. First redundant parts of the key are not stored. If the row in a key is the same as the previous row, then its not stored again. The same is done for columns and time stamps. After the relative encoding is done a block of key values is then compressed with gzip. As data is read from an RFile, when the row of a key is the same as the previous key it will just point to the previous keys row. This is carried forward over the wire. As keys are transferred, duplicate fields in the key are not transferred. As far as decompressing on the client side vs server side, the server at least needs to decompress keys. On the server side you usually need to read from multiple sorted files and order the result. So you need to decompress keys on the server side to compare them. Also iterators on the server side need the keys and values decompressed. > Thanks, > Ameet