hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Regarding data storage in HBase
Date Thu, 19 Jan 2012 18:12:30 GMT

Hi there-

re: #1 


See http://hbase.apache.org/book.html#regions.arch

Also see http://hbase.apache.org/book.html#trouble.namenode.hbase.objects
for what the directory structure looks like in HDFS.

Re #2:

Flushes are written as StoreFiles in HDFS.

See http://hbase.apache.org/book.html#regions.arch

Also see the section on "Region-RegionServer Locality"

re: #3

Flushed files, the total size of StoreFiles per region.

See http://hbase.apache.org/book.html#regions.arch

#4.  Not entirely sure about what you are asking, but see the WAL section
in the Regions section.

On 1/19/12 6:34 AM, "Praveen Sripati" <praveensripati@gmail.com> wrote:

>According to the `Hadoop - The Definitive Guide`
>Writes arriving at a regionserver are first appended to a commit log and
>then are added to an in-memory memstore. When a memstore fills, its
>is flushed to the filesystem.
>The commit log is hosted on HDFS, so it remains available through a
>regionserver crash.
>Couple of questions
>1. When the memstore fills, is it flushed to HDFS or local file system?
>2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the
>HDFS Block Size is set to 64MB, will the region be split across 4 data
>nodes? I know that this doesn't make sense to split a single regions data
>across nodes in HDFS, but how is it handled in HBase?
>3. Is region size (hbase.hregion.max.filesize) the size of commit log or
>the size of the file that has been flushed?
>4. The commit log might become big over time, is there similar concept of
>checkpoint in HBase for the commit logs?
>I am familiar with HDFS and trying to map it to HBase.

View raw message