Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of ryanobjc@gmail.com designates
 209.85.160.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=pjbQ8plV+Iw9Dh/kNTsBsKS8MDCpFKssxfxVlNHjv4hrIAOmy4v3lRZD/M0wfE+IKF
         ZaC0TmIxUrCAGaikDWcMUYdcDGUtwSbE1NFGI9N2PTASCFqU39vSxflINNfMOad3cI5g
         lPqao9KOOi95DwO1s0c/D87ZSX2gCvqaSD9EM=
MIME-Version: 1.0
In-Reply-To: <AANLkTim0OwYrfwTtak77V6OVkf5_FYbU_kZb4qKXowGB@mail.gmail.com>
References: <E8A2A8F57FA44930B4B6E9809D18A7F5@OPERAO>
	 <n2o78568af11005061942q2aab9b3x3ec03d2777315e99@mail.gmail.com>
	 <p2x4d9f5be51005062144q2569ebbes2edce1d7c8df231f@mail.gmail.com>
	 <AANLkTikiwmFP4-jonZF1ES-yWdouMIF3JAm-dthUeqSV@mail.gmail.com>
	 <5A2612CDB1864941A53C65238CE2CB8C@OPERAO>
	 <AANLkTim0OwYrfwTtak77V6OVkf5_FYbU_kZb4qKXowGB@mail.gmail.com>
Date: Thu, 6 May 2010 23:02:00 -0700
Message-ID: <x2q78568af11005062302q22f5e8e2lf30b3cb0c7cce3d1@mail.gmail.com>
Subject: Re: How is column timestamp useful?
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hey guys,

You can't just turn off versioning - it's not a optional feature, its
a core part of how the storage architecture works.  I can suggest both
the bigtable paper and also this blog entry:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

To get a sense of what version are for, why you can't "turn them off".

-ryan

On Thu, May 6, 2010 at 10:47 PM, Kevin Apte
<technicalarchitect2007@gmail.com> wrote:
> If compression is used overhead of versioning is not significant. Many
> people want versioning of data for many reasons- including auditing and
> compliance. In some database systems, analyzing data is effective only if
> performed on the same version.
>
> I agree that if there is no need, =A0versioning should be turned off.
>
> Kevin
>
>
>
> On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa <
> tsunakawa.takay@jp.fujitsu.com> wrote:
>
>> Hello, Kevin-san
>>
>> Yes, Hadoop DFS maintains three copies of the same data (version) at
>> the file system level. What I'm wondering about is the necessity of
>> different versions of cells by HBase at the database level.
>> Amazon SimpleDB, Microsoft Azure Table, and Google App Engine
>> Datastore do not provide versioning. So I felt that many people do not
>> have to use versioning and the default maximum versions of HBase had
>> better be
>>
>> Regards
>> Takayuki
>>
>>
>> ----- Original Message -----
>> From: "Kevin Apte" <technicalarchitect2007@gmail.com>
>> To: <hbase-user@hadoop.apache.org>
>> Sent: Friday, May 07, 2010 1:51 PM
>> Subject: Re: How is column timestamp useful?
>>
>>
>> > Hadoop philosophy is to deploy on low cost disks and keep 3 copies
>> of data
>> > for redundancy. This ensures that the costs are very low- perhaps 5
>> to 10
>> > times lower than what large Enterprises are paying for expensive SAN
>> > configurations.
>> >
>> > This does not mean one needs to waste storage- =A0If you store files
>> > compressed using gZip, multiple versions of a row may compress very
>> well.
>> >
>> > Kevin
>> >
>> >
>> >
>> > On Fri, May 7, 2010 at 10:14 AM, tsuna <tsunanet@gmail.com> wrote:
>> >
>> >> In addition to what Ryan said, even if the default maximum number
>> of
>> >> versions for a cell is 3 doesn't mean that you end up wasting
>> space.
>> >> If you only ever write one version, that's what you end up paying
>> for.
>> >>
>> >> --
>> >> Benoit "tsuna" Sigoure
>> >> Software Engineer @ www.StumbleUpon.com
>> >>
>> >
>>
>>
>>
>