hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stuh...@webmail.us>
Subject RE: HBase num_versions
Date Wed, 07 Nov 2007 19:45:39 GMT
Currently in the shell, num_versions = 0 is equivalent to 'all versions'.

I don't think that needs to be changed, unless someone can imagine a clause on a query that
wouldn't require a version of a row to operate correctly.

Thanks,
Stu


-----Original Message-----
From: Jim Kellerman <jim@powerset.com>
Sent: Wednesday, November 7, 2007 1:03pm
To: hadoop-user@lucene.apache.org <hadoop-user@lucene.apache.org>
Subject: RE: HBase num_versions

num_versions=all ?

---
Jim Kellerman, Senior Engineer; Powerset
jim@powerset.com


> -----Original Message-----
> From: Michael Stack [mailto:stack@duboce.net]
> Sent: Wednesday, November 07, 2007 9:59 AM
> To: hadoop-user@lucene.apache.org
> Cc: stuhood@webmail.us
> Subject: Re: HBase num_versions
>
> In the absence of a num_versions qualifier, shell makes
> presumption that you want ALL versions.  Changing the default
> to be 1 would mean that we would have to add some other means
> of specifying all versions ("num_versions=-1" or some such
> oddity).  What ye think?
> St.Ack
>
>
> Jim Kellerman wrote:
> > Yes, for num_versions > 1, HBase has to dig through the
> memcache, and multiple HStore files until it has found the
> requested number of versions or runs out of places to look.
> This is especially apparent if there is only 1 version. It
> has to do a lot of work for nothing.
> >
> > Please enter a Jira for the HBase shell to default the
> number of versions to 1.
> >
> > ---
> > Jim Kellerman, Senior Engineer; Powerset jim@powerset.com
> >
> >
> >
> >> -----Original Message-----
> >> From: Stu Hood [mailto:stuhood@webmail.us]
> >> Sent: Tuesday, November 06, 2007 11:23 PM
> >> To: hadoop-user@lucene.apache.org
> >> Subject: HBase num_versions
> >>
> >> Hey guys,
> >>
> >> Just noticed some surprising behavior for select statements
> >> in HBase 0.15: a select command without a num_versions = 1
> >> clause takes 2 orders of magnitude longer to run than a
> bare select.
> >>
> >> Is this inconsistent implementation, or is it taking extra
> >> time to scan for additional versions? If this isn't a bug,
> >> then perhaps the default for num_versions should be 1 to keep
> >> things snappy by default.
> >>
> >> ============================================================
> >>
> >> Hbase> describe test;
> >> +-------------------------------------------------------------
> >> ----------------+
> >> | Column Family Descriptor
> >>                 |
> >> +-------------------------------------------------------------
> >> ----------------+
> >> | name: hex, max versions: 3, compression: NONE, in memory:
> >> false, max length:|
> >> |  2147483647, bloom filter: none
> >>                 |
> >> +-------------------------------------------------------------
> >> ----------------+
> >> 1 columnfamily(s) in set (0.310 sec)
> >> Hbase> select hex: from test where row = '3980000'
> num_versions = 1;
> >> 3cbae0
> >> 1 row(s) in set (0.016 sec)
> >> Hbase> select hex: from test where row = '3980000';
> >> 3cbae0
> >> 1 row(s) in set (1.882 sec)
> >>
> >> ============================================================
> >>
> >>
> >> Thanks,
> >>
> >>
> >> Stu Hood
> >> Webmail.us
> >> "You manage your business. We'll manage your email."(r)
> >>
> >>
> >>
>
>



Mime
View raw message