hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: How is column timestamp useful?
Date Fri, 07 May 2010 06:02:00 GMT
Hey guys,

You can't just turn off versioning - it's not a optional feature, its
a core part of how the storage architecture works.  I can suggest both
the bigtable paper and also this blog entry:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

To get a sense of what version are for, why you can't "turn them off".

-ryan

On Thu, May 6, 2010 at 10:47 PM, Kevin Apte
<technicalarchitect2007@gmail.com> wrote:
> If compression is used overhead of versioning is not significant. Many
> people want versioning of data for many reasons- including auditing and
> compliance. In some database systems, analyzing data is effective only if
> performed on the same version.
>
> I agree that if there is no need,  versioning should be turned off.
>
> Kevin
>
>
>
> On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa <
> tsunakawa.takay@jp.fujitsu.com> wrote:
>
>> Hello, Kevin-san
>>
>> Yes, Hadoop DFS maintains three copies of the same data (version) at
>> the file system level. What I'm wondering about is the necessity of
>> different versions of cells by HBase at the database level.
>> Amazon SimpleDB, Microsoft Azure Table, and Google App Engine
>> Datastore do not provide versioning. So I felt that many people do not
>> have to use versioning and the default maximum versions of HBase had
>> better be
>>
>> Regards
>> Takayuki
>>
>>
>> ----- Original Message -----
>> From: "Kevin Apte" <technicalarchitect2007@gmail.com>
>> To: <hbase-user@hadoop.apache.org>
>> Sent: Friday, May 07, 2010 1:51 PM
>> Subject: Re: How is column timestamp useful?
>>
>>
>> > Hadoop philosophy is to deploy on low cost disks and keep 3 copies
>> of data
>> > for redundancy. This ensures that the costs are very low- perhaps 5
>> to 10
>> > times lower than what large Enterprises are paying for expensive SAN
>> > configurations.
>> >
>> > This does not mean one needs to waste storage-  If you store files
>> > compressed using gZip, multiple versions of a row may compress very
>> well.
>> >
>> > Kevin
>> >
>> >
>> >
>> > On Fri, May 7, 2010 at 10:14 AM, tsuna <tsunanet@gmail.com> wrote:
>> >
>> >> In addition to what Ryan said, even if the default maximum number
>> of
>> >> versions for a cell is 3 doesn't mean that you end up wasting
>> space.
>> >> If you only ever write one version, that's what you end up paying
>> for.
>> >>
>> >> --
>> >> Benoit "tsuna" Sigoure
>> >> Software Engineer @ www.StumbleUpon.com
>> >>
>> >
>>
>>
>>
>

Mime
View raw message