hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase Checksum
Date Fri, 01 Feb 2013 21:09:30 GMT
Thanks for the clarification Lars.

Is there any UI or specify startup log we can check to validate that
it's activated? If not, will it be nice to have something like that?

2013/2/1, lars hofhansl <larsh@apache.org>:
> Doing HBase level checksums (as opposed to HDFS level) will mostly yield
> results for random gets.
> Scans (like rowcounting and similar) will probably see a negligible
> improvement.
>
> In HDFS a block and its checksum are stored in different local files on each
> datanode. So loading a block requires 2 IOs.
> With the checksum handled by HBase only one IO is needed per block.
>
>
>
> ________________________________
>  From: Robert Dyer <rdyer@iastate.edu>
> To: Hbase-User <user@hbase.apache.org>
> Sent: Friday, February 1, 2013 11:37 AM
> Subject: Re: HBase Checksum
>
> Yes that log is a debug level log, as I saw in the source.  But I too
> enabled DEBUG and still never saw that log message.
>
> But I, unlike you, see absolutely no change in performance.
>
> One test I did however that makes me think it is actually enabled: if I
> submit from another user I start getting security warnings about that user
> not having permission for shortcircuit.  So perhaps it is working, but I
> have no clue why that log fails to show anywhere.
>
> Regarding enabling checksums that is an interesting question.  Do I have to
> do a major compaction after enabling so HBase writes the checksum?  Or will
> it detect the setting change and do that automatically?  What if I disable,
> will it remove the checksums?
>
>
> On Fri, Feb 1, 2013 at 6:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
>> wrote:
>
>> Hi Robert,
>>
>> That's perfectly fine, it was my next question ;)
>>
>>
>> Anoop, I saw a 5% performance increase by activating HBase Checksum.
>> Can I disable it again to retry the baseline and see the difference?
>> Or now that it's there, it's to late?
>>
>> Also, regarding BlockReaderLocal, I don't find that in my logs, but
>> after I have activated the shortcircuit, I saw a 41% performance
>> increase, so I'm almost sure it's working, but I don't know either how
>>  to check that.
>>
>> What's the best way to see that on the logs? It's not display when
>> HBase is starting. Even not displayed when I'n doing major
>> compactions.
>>
>> I turned org.apache.hadoop.hdfs.BlockReaderLocal loglevel to debug and
>> still can't see anything. Not in the region server, and not in the
>> datanode.
>>
>> Also, to check with HDFS level logs whether the checksum meta file is
>> getting read to the DFS client, I'm not really sure how to acheive
>> that.
>>
>> JM
>>
>> 2013/2/1, Robert Dyer <rdyer@iastate.edu>:
>> > Ok grepping the RS logs I see nothing with 'local' in any of them.
>>  Thanks
>> > for that hint.
>> >
>> > For the test I was using, I know it is data local.  Every map task
>> launched
>> > data local, and no regions were moving recently.
>> >
>> > I think I've hijacked this thread enough, I'll move my issues to
>> > another.
>> > ;-)
>> >
>> >
>> > On Thu, Jan 31, 2013 at 11:51 PM, Anoop Sam John <anoopsj@huawei.com>
>> > wrote:
>> >
>> >> Hi Robert
>> >>           When HDFS is doing the local short circuit read, it will
use
>> >> BlockReaderLocal class for reading.  There should be some logs at the
>> DFS
>> >> client side (RS) which tells abt creating new BlockReaderLocal .  If
>> >> you
>> >> can see this then sure the local read is happening.
>> >>
>> >> Also check DN log.  If local read happening, then you will not see
>> >> read
>> >> request related logs for the HFile at the DN side.
>> >> You check your no# of HFiles and names for checking the logs
>> >>
>> >> Are you sure that when you tested, u have data locality? Region
>> movements
>> >> across RSs can break the full data locality.
>> >>
>> >> -Anoop-
>> >> ________________________________________
>> >> From: Robert Dyer [psybers@gmail.com]
>> >> Sent: Friday, February 01, 2013 11:10 AM
>> >> To: Hbase-User
>> >> Subject: Re: HBase Checksum
>> >>
>> >> Not trying to hijack your thread here...
>> >>
>> >> But can you verify via logs that the shortcircuit is working?  Because
>> >> I
>> >> enabled shortcircuit but I sure didn't see any performance increase.
>> >>
>> >> I haven't tried enabling hbase checksum yet but I'd like to be able to
>> >> verify that works too.
>> >>
>> >>
>> >> On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John <anoopsj@huawei.com>
>> >> wrote:
>> >>
>> >> > You can check with HDFS level logs whether the checksum meta file is
>> >> > getting read to the DFS client? In the HBase handled checksum, this
>> >> should
>> >> > not happen.
>> >> > Have you noticed any perf gain when you configure the HBase handled
>> >> > checksum option?
>> >> >
>> >> > -Anoop-
>> >> > ________________________________________
>> >> > From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
>> >> > Sent: Friday, February 01, 2013 4:16 AM
>> >> > To: user
>> >> > Subject: HBase Checksum
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have activated shortcircuit and checksum and I would like to get
a
>> >> > confirmation that it's working fine.
>> >> >
>> >> > So I have activated short circuit first and saw a 40% improvement of
>> >> > the MR rowcount job. So I guess it's working fine.
>> >> >
>> >> > Now, I'm configuring the checksum option, and I'm wondering how I can
>> >> > do to validate that it's taken into consideration and used, or not.
>> >> > Is
>> >> > there a way to see that?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > JM
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu

Mime
View raw message