hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jl...@streamy.com>
Subject Re: Sorting in HBase
Date Wed, 22 Jul 2009 20:41:40 GMT
Read the HBase Architecture page on the wiki, as well as the BigTable 
paper.  There are links all over the wiki, and also if you search the 
mailing list archives.

HBase has a write cache, previously called Memcache, now called 
MemStore.  That is where writes go, and then periodically flushed to 
HDFS.  Each family of each region is made up of it's Memstore and 0 to N 
on-HDFS files.

akhil1988 wrote:
> Hi,
> I wanted some clarification for the following doubt I am having regarding
> HBase functioning:
> I am using BulkImport program to store milliions of documents(text) in a
> HTable. Each of the tasktracker reads some portion (given to it according to
> input split calculations) of a big file, extracts individual documents from
> the split and stores them in the HTable with document id as the row key. 
> Now HBase claims that it stores the rows in sorted manner. My question is
> that how does it sorts the row keys when random integers(row keys) are
> emitted by the tasktrackers i.e. When a new row id comes, how does the HBase
> client knows in which region to store the row? Suppose a row id is to be
> stored that lies between two already stored rows in the HTable. Where will
> this row now be stored? Does it reshuffles them?
> Any understanding of the working of HBase / any reference will be helpful.
> Thanks,
> Akhil

View raw message