Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 59308 invoked from network); 7 May 2010 06:02:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 May 2010 06:02:30 -0000 Received: (qmail 61427 invoked by uid 500); 7 May 2010 06:02:29 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 61318 invoked by uid 500); 7 May 2010 06:02:29 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 61310 invoked by uid 99); 7 May 2010 06:02:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 May 2010 06:02:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ryanobjc@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 May 2010 06:02:23 +0000 Received: by pwi2 with SMTP id 2so370699pwi.35 for ; Thu, 06 May 2010 23:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=BLd/OROXQWWMzpeS11IPuC5XNJiD0v3u9pVehSZLTtc=; b=gwChr5D7RlL6zVlYmMD0wQjk5GP54WXtdemPi4gbkgemmKy14t0qFe5bbmWA8XHtEn k8W1QUU3Lkt7SjP3gRv3XXTz/C6rUUYDmvE6Btan4HVwY+M+YCMRUj2NylzLKJZtfNfa NRy86qqZ3YYS8efmsqj9n6jJdaE1dgYqjTWNI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=pjbQ8plV+Iw9Dh/kNTsBsKS8MDCpFKssxfxVlNHjv4hrIAOmy4v3lRZD/M0wfE+IKF ZaC0TmIxUrCAGaikDWcMUYdcDGUtwSbE1NFGI9N2PTASCFqU39vSxflINNfMOad3cI5g lPqao9KOOi95DwO1s0c/D87ZSX2gCvqaSD9EM= MIME-Version: 1.0 Received: by 10.141.23.20 with SMTP id a20mr7550730rvj.49.1273212121243; Thu, 06 May 2010 23:02:01 -0700 (PDT) Received: by 10.140.147.19 with HTTP; Thu, 6 May 2010 23:02:00 -0700 (PDT) In-Reply-To: References: <5A2612CDB1864941A53C65238CE2CB8C@OPERAO> Date: Thu, 6 May 2010 23:02:00 -0700 Message-ID: Subject: Re: How is column timestamp useful? From: Ryan Rawson To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hey guys, You can't just turn off versioning - it's not a optional feature, its a core part of how the storage architecture works. I can suggest both the bigtable paper and also this blog entry: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html To get a sense of what version are for, why you can't "turn them off". -ryan On Thu, May 6, 2010 at 10:47 PM, Kevin Apte wrote: > If compression is used overhead of versioning is not significant. Many > people want versioning of data for many reasons- including auditing and > compliance. In some database systems, analyzing data is effective only if > performed on the same version. > > I agree that if there is no need, =A0versioning should be turned off. > > Kevin > > > > On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa < > tsunakawa.takay@jp.fujitsu.com> wrote: > >> Hello, Kevin-san >> >> Yes, Hadoop DFS maintains three copies of the same data (version) at >> the file system level. What I'm wondering about is the necessity of >> different versions of cells by HBase at the database level. >> Amazon SimpleDB, Microsoft Azure Table, and Google App Engine >> Datastore do not provide versioning. So I felt that many people do not >> have to use versioning and the default maximum versions of HBase had >> better be >> >> Regards >> Takayuki >> >> >> ----- Original Message ----- >> From: "Kevin Apte" >> To: >> Sent: Friday, May 07, 2010 1:51 PM >> Subject: Re: How is column timestamp useful? >> >> >> > Hadoop philosophy is to deploy on low cost disks and keep 3 copies >> of data >> > for redundancy. This ensures that the costs are very low- perhaps 5 >> to 10 >> > times lower than what large Enterprises are paying for expensive SAN >> > configurations. >> > >> > This does not mean one needs to waste storage- =A0If you store files >> > compressed using gZip, multiple versions of a row may compress very >> well. >> > >> > Kevin >> > >> > >> > >> > On Fri, May 7, 2010 at 10:14 AM, tsuna wrote: >> > >> >> In addition to what Ryan said, even if the default maximum number >> of >> >> versions for a cell is 3 doesn't mean that you end up wasting >> space. >> >> If you only ever write one version, that's what you end up paying >> for. >> >> >> >> -- >> >> Benoit "tsuna" Sigoure >> >> Software Engineer @ www.StumbleUpon.com >> >> >> > >> >> >> >