Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 53609 invoked from network); 7 Jun 2010 21:50:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Jun 2010 21:50:39 -0000 Received: (qmail 80826 invoked by uid 500); 7 Jun 2010 21:50:38 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 80803 invoked by uid 500); 7 Jun 2010 21:50:38 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 80795 invoked by uid 99); 7 Jun 2010 21:50:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jun 2010 21:50:38 +0000 X-ASF-Spam-Status: No, hits=-1504.0 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jun 2010 21:50:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o57LoICo019005 for ; Mon, 7 Jun 2010 21:50:18 GMT Message-ID: <30859941.13291275947418192.JavaMail.jira@thor> Date: Mon, 7 Jun 2010 17:50:18 -0400 (EDT) From: "Luke Lu (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5793) High speed compression algorithm like BMDiff MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876436#action_12876436 ] Luke Lu commented on HADOOP-5793: --------------------------------- It's written from scratch based on the original BM paper to scratch an itch, which turned out to be stable and fast enough. Zvents owned the copyright for this particular version, however, and I don't know the status of the transfer of the copyright to Hypertable inc. OTOH, a shorter and faster version can be written from scratch again if my current employer (Yahoo) is feeling even more generous than she already is :) > High speed compression algorithm like BMDiff > -------------------------------------------- > > Key: HADOOP-5793 > URL: https://issues.apache.org/jira/browse/HADOOP-5793 > Project: Hadoop Common > Issue Type: New Feature > Reporter: elhoim gibor > Assignee: Michele Catasta > Priority: Minor > > Add a high speed compression algorithm like BMDiff. > It gives speeds ~100MB/s for writes and ~1000MB/s for reads, compressing 2.1billions web pages from 45.1TB in 4.2TB > Reference: > http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437 > 2005 Jeff Dean talk about google architecture - around 46:00. > http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/ > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=755678 > A reference implementation exists in HyperTable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.