hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Sorting columns
Date Sat, 19 Jun 2010 16:21:37 GMT
So there is no confusion, everything is sorted in HBase.  All columns in each family are sorted,
always.

There are optimizations for Get queries (in 0.20 but gone in trunk) that make it so that what
gets returned to the client is not completely sorted though it would be mostly sorted.  Are
you returning millions of columns at once?  Otherwise it shouldn't be too expensive to do
the sorted() call in the client.

> -----Original Message-----
> From: Andrey Stepachev [mailto:octo47@gmail.com]
> Sent: Saturday, June 19, 2010 5:45 AM
> To: user@hbase.apache.org
> Subject: Re: Sorting columns
> 
> 2010/6/19 Stack <stack@duboce.net>
> 
> > On Thu, Jun 17, 2010 at 12:18 PM, Andrey Stepachev <octo47@gmail.com>
> > wrote:
> > > As i see in sources there no place, where kv sorted (except client
> > > Result.sorted() method). So we can get keyvalues from store and
> from
> > > memstore (and in this case we can get 1 3 5 from stores and 4 from
> > memstore)
> > > in incorrect order.
> > >
> > > Or I miss something?
> > >
> >
> > Data is sorted in hbase.  Scanning, we'll be running a scanner
> against
> > each data store element -- memstore and one for each store file --
> and
> > we'll pop off the elements in order.  Thats the general story.  There
> > may once have been a legitimate reason for the client-side sort --
> > perhaps when our Get and Scan code paths differed it was needed --
> but
> > as to whether it still required, I'm not sure.  I'd have to dig.  Any
> > one else?
> >
> 
> It is very interesting to know, is hbase guarantee ordering in columns.
> Because if
> someone will use very wide rows, in absence of sorting, it is not very
> useful (and of course
> someone should know about partitioning problem for wide rows).
> Suppose, that we want to work with time data, in that case we can use
> qualifiers as
> date and expect data in sorted order and we can't order it somewhere
> else,
> because
> we will lost most of hbase advantage.
> 
> 
> 
> >
> > >
> > >> > The rest of the data needs to be accessed occasionally. We want
> to
> > avoid
> > >> > getting it shipped to the client as it makes our map reduce job
> go out
> > of
> > >> > memory.
> > >> >
> > >>
> > >> You are not using incremental get on a row?  You should be able to
> get
> > >> your big rows piecemeal.
> > >>
> > > This scanner api changes was not included in 0.20.4 :( (infra row
> > scanner).
> > >
> >
> > Oh.
> >
> > Sorry about that Andrey.  Somehow we missed your backport of
> > HBASE-1537.  I just applied it.  It'll appear in the 0.20.5RC4 I'm
> > rolling now.  Please excuse our bungling.
> >
> 
> Not a problem. I'll wait 0.20.5. But I should warn, that with this
> patch
> 0.20.5 will be not wire compatible with 0.20.4 (because this patch adds
> additional
> field in Scan, and this make Scan binary incompatible).
> 
> I'm, personnaly, not using now infrarow scanner, because of unknown
> ordering, i use
> compound keys.
> More over, infrarow scanning should use separate api (giving Result the
> ability
> to fetch additional kvs for given row) to be mo usable and easy to use.
Mime
View raw message