hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Sripati <praveensrip...@gmail.com>
Subject Re: Regarding data storage in HBase
Date Fri, 20 Jan 2012 05:18:27 GMT
Thanks for the response.

> 4. The commit log might become big over time, is there similar concept of
> checkpoint in HBase for the commit logs?
>
>WALs are rolled at configurable size -- usually 64MB. WALs that have edits
that have been all flushed to hfiles are let go/deleted.

1) Are WAL's flushed to HFile periodically or just in the case of a
regionserver crash? The WALs may grow over time, that's the purpose of
asking this query? In HDFS the flush is done when the WAL size reaches
'dfs.namenode.checkpoint.size' or after every
'dfs.namenode.checkpoint.period' seconds.

2) I went through the 'HBase Architecture 101 - Storage' blog entry (1)
authored 2 years back which was very useful. Is it still relevant?

Praveen

(1) - http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

On Thu, Jan 19, 2012 at 11:42 PM, Doug Meil
<doug.meil@explorysmedical.com>wrote:

>
> Hi there-
>
> re: #1
>
> HDFS
>
> See http://hbase.apache.org/book.html#regions.arch
>
> Also see http://hbase.apache.org/book.html#trouble.namenode.hbase.objects
> for what the directory structure looks like in HDFS.
>
>
> Re #2:
>
> Flushes are written as StoreFiles in HDFS.
>
> See http://hbase.apache.org/book.html#regions.arch
>
> Also see the section on "Region-RegionServer Locality"
>
> re: #3
>
> Flushed files, the total size of StoreFiles per region.
>
>
> See http://hbase.apache.org/book.html#regions.arch
>
> #4.  Not entirely sure about what you are asking, but see the WAL section
> in the Regions section.
>
>
>
>
> On 1/19/12 6:34 AM, "Praveen Sripati" <praveensripati@gmail.com> wrote:
>
> >Hi,
> >
> >According to the `Hadoop - The Definitive Guide`
> >
> >Writes arriving at a regionserver are first appended to a commit log and
> >then are added to an in-memory memstore. When a memstore fills, its
> >content
> >is flushed to the filesystem.
> >The commit log is hosted on HDFS, so it remains available through a
> >regionserver crash.
> >
> >Couple of questions
> >
> >1. When the memstore fills, is it flushed to HDFS or local file system?
> >
> >2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the
> >HDFS Block Size is set to 64MB, will the region be split across 4 data
> >nodes? I know that this doesn't make sense to split a single regions data
> >across nodes in HDFS, but how is it handled in HBase?
> >
> >3. Is region size (hbase.hregion.max.filesize) the size of commit log or
> >the size of the file that has been flushed?
> >
> >4. The commit log might become big over time, is there similar concept of
> >checkpoint in HBase for the commit logs?
> >
> >I am familiar with HDFS and trying to map it to HBase.
> >
> >Regards,
> >Praveen
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message