hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Rao <krishnanj...@gmail.com>
Subject Re: HBase checksum vs HDFS checksum
Date Tue, 29 Apr 2014 08:54:13 GMT
Thank you for your reply Anoop.

However, the confusing is, unfortunately, still there because of the
following (from
here<http://hbase.apache.org/book.html#perf.hdfs.configs.localread>
):

"For optimal performance when short-circuit reads are enabled, it is
recommended that HDFS checksums are disabled. To maintain data integrity
with HDFS checksums disabled, HBase can be configured to write its own
checksums into its datablocks and verify against these"

To me it implies that HDFS checksum needs to be disabled, meaning that HDFS
wouldn't write checksums into it's datablocks. But HBase would be fine by
writing it's own checksum.


On 29 April 2014 09:32, Anoop John <anoop.hbase@gmail.com> wrote:

> HBase using its own checksum handling doesn't directly affect HDFS. It will
> still maintain checksum info.  The diff is at the read time..  HBase will
> open reader with checksum validation false and it will do checksum
> validation on its own.   So using hbase handled checksum in a cluster
> should not affect other data..  Does that solves your doubt?
>
> -Anoop-
>
> On Tue, Apr 29, 2014 at 1:58 PM, Krishna Rao <krishnanjrao@gmail.com>
> wrote:
>
> > Hi Ted,
> >
> > I had read those, but I'm confused about how this will affect non-HBase
> > HDFS data. With HDFS checksumming off won't it affect data integrity?
> >
> > Krishna
> >
> >
> > On 24 April 2014 15:54, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Please take a look at the following:
> > >
> > > http://hbase.apache.org/book.html#perf.hdfs.configs.localread
> > > http://hbase.apache.org/book.html#hbase.regionserver.checksum.verify
> > >
> > >
> > > On Thu, Apr 24, 2014 at 5:55 AM, Krishna Rao <krishnanjrao@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I understand that there is a significant improvement gain when
> turning
> > on
> > > > short circuit reads, and additionally by setting HBase to do
> checksums
> > > > rather than HDFS.
> > > >
> > > > However, I'm a little confused by this, do I need to turn of checksum
> > > > within HDFS for the entire file system? We don't just use HBase on
> our
> > > > cluster, so this would seem to be a bad idea right?
> > > >
> > > >  Cheers,
> > > >
> > > > Krishna
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message