hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject RE: Few questions
Date Thu, 05 Feb 2009 19:57:56 GMT
The more map files in a region, the slower your scanning will be because you
are actually scanning each one.

Recent row updates will not hurt you too bad because you always have a
scanner open in Memcache (and results in memory are obviously the fastest to
retrieve).  But you'll always pay a search cost for each Mapfile that makes
up the region you're scanning.

Each region is defined by [startKey,endKey).  Each region is made up of an
in-memory map (Memcache) and 0->N HDFS files (Mapfiles).  Each of these is
individually lexicographically sorted.  Scanning the table involves scanning
every file in the region.  Major compactions combine all files into one.

Is that clear?

JG

> -----Original Message-----
> From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> Sent: Thursday, February 05, 2009 11:33 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Few questions
> 
> Thank You for a quick response.So, you wrote:
> 
> HBase is efficient at retrieving rows in a range between rows are
> sorted in
> lexicographical order.
> 
> My question is it still efficient when rows are within the range but in
> the different map files (Like in the case of row update) ?
> And another question: map file is it lexicographically sorted ? There
> no
> sort of data between map files on the same region, is it correct ?
> 
> 
> Best Regards.
> Slava.
> 
> 
> On Thu, Feb 5, 2009 at 8:20 PM, Jonathan Gray <jlist@streamy.com>
> wrote:
> 
> > Answers inline.
> >
> > > -----Original Message-----
> > > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > > Sent: Thursday, February 05, 2009 9:21 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Few questions
> > >
> > > Hi to All.
> > >
> > > I have a few questions to ask:
> > >
> > > 1) Is it possible to bring specific columns from the same row
> within 1
> > > round
> > > trip (some method that takes list of column names and return
> rowresult)
> > > ?
> >
> >
> >
> http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase
> /clie
> > nt/HTable.html#getRow(byte[],%20byte[][])
> >
> > HTable.getRow(byte [] row, byte [][] columns)
> >
> > Ex: byte [][] columns = {"family:column1".getBytes(),
> > "family:column2".getBytes()};
> >
> >
> > > 2) Is key size has any implications on HBase performance?
> >
> > There are some implications but as far as I know nothing that
> significant.
> > Most users have keys on order of 10s or 100s of bytes and I've never
> seen a
> > large difference between them.  Of course, the smaller the key the
> smaller
> > the payload to store and transfer.
> >
> >
> > > 3) Somewhere, i don't remember where, I read that HBase know very
> fast
> > > and
> > > efficient to retrieve rows in the range between 2 given keys, is it
> > > correct
> > > ?
> > >    If yes, how it's implemented ? I suggest that data in mapfile is
> > > sorted
> > > by key (when i inserted the rows), but what happened when i updated
> > > the specific row, i guess because in
> > >    HBase everything is insert , it means that updated row will be
> > > stored
> > > (probably) in different map file than original row, is it correct ?
> If
> > > yes,
> > > how can be promised efficient and fast
> > >    retrieval of rows in the range between 2 keys, in this case it
> could
> > > be
> > > retrieval of rows from different map files.
> >
> >
> > HBase is efficient at retrieving rows in a range between rows are
> sorted in
> > lexicographical order.
> >
> > Check out the HBase architecture wiki page section on HRegionServer
> > (http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion).
> >
> > Writes in HBase are first stored into an in-memory structure called
> > Memcache.  This is periodically flushed to an HDFS Mapfile.  A single
> > region
> > in HBase is made up of one Memcache and 0 to N mapfiles.
> >
> > So a scanner in HBase is really the merge of a number of scanners.
> One
> > open
> > to the Memcache (recent writes), and one open to each flushed out
> Mapfile.
> >
> >
> > Hope that helps.
> >
> > JG
> >
> >


Mime
View raw message