hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <raymond....@intel.com>
Subject RE: why my test result on dfs short circuit read is slower?
Date Sat, 16 Feb 2013 07:54:04 GMT
It seems to me that, with short circuit read enabled, the BlockReaderLocal read data in 512/4096
bytes unit(checksum check enabled/skiped)

While when It go through datanode, the BlockSender.sendChunks will read and sent data in 64K
bytes units?

Is that true? And if so, won't it explain that read through datanode will be faster? Since
it read data in bigger block size.

Best Regards,
Raymond Liu


> -----Original Message-----
> From: Liu, Raymond [mailto:raymond.liu@intel.com]
> Sent: Saturday, February 16, 2013 2:23 PM
> To: user@hadoop.apache.org
> Subject: RE: why my test result on dfs short circuit read is slower?
> 
> Hi Arpit Gupta
> 
> Yes,  this way also confirms that short circuit read is enabled on my cluster.
> 
> 13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true
> 
> 13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file
> /mnt/DP_disk4/raymond/hdfs/data/current/subdir63/blk_-2736548898990727
> 638 of size 134217728 startOffset 0 length 134217728 short circuit checksum
> false
> 
> So , any possibility that other setting might impact short circuit read to has
> worse performance than read through datanode?
> 
> Raymond
> 
> 
> 
> >Another way to check if short circuit read is configured correctly.
> 
> >As the user who is configured for short circuit read issue the following
> command on a node where you expect the data to be read locally.
> 
> >export HADOOP_ROOT_LOGGER=debug,console; hadoop dfs -cat
> >/path/to/file_on_hdfs
> 
> >On the console you should see something like "hdfs.DFSClient: New
> BlockReaderLocal for file...."
> 
> >This would confirm that short circuit read is happening.
> 
> --
> >Arpit Gupta
> >Hortonworks Inc.
> >http://hortonworks.com/
> 
> On Feb 15, 2013, at 9:53 PM, "Liu, Raymond" <raymond.liu@intel.com> wrote:
> 
> 
> Hi Harsh
> 
> Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml.
> 
> And I have double confirmed that local reads are performed, since there are no
> Error in datanode logs, and by watching lo network IO.
> 
> 
> 
> If you want HBase to leverage the shortcircuit, the DN config
> "dfs.block.local-path-access.user" should be set to the user running HBase (i.e.
> hbase, for example), and the hbase-site.xml should have
> "dfs.client.read.shortcircuit" defined in all its RegionServers. Doing this wrong
> could result in performance penalty and some warn-logging, as local reads will
> be attempted but will begin to fail.
> 
> On Sat, Feb 16, 2013 at 8:40 AM, Liu, Raymond <raymond.liu@intel.com>
> wrote:
> 
> Hi
> 
>        I tried to use short circuit read to improve my hbase cluster MR scan
> performance.
> 
> 
>        I have the following setting in hdfs-site.xml
> 
>        dfs.client.read.shortcircuit set to true
>        dfs.block.local-path-access.user set to MR job runner.
> 
>        The cluster is 1+4 node and each data node have 16cpu/4HDD, with
> all hbase table major compact thus all data is local.
> 
>        I have hoped that the short circuit read will improve the
> performance.
> 
> 
>        While the test result is that with short circuit read enabled, the
> performance actually dropped 10-15%. Say scan a 50G table cost around 100s
> instead of 90s.
> 
> 
>        My hadoop version is 1.1.1, any idea on this? Thx!
> 
> Best Regards,
> Raymond Liu
> 
> 
> 
> 
> --
> Harsh J


Mime
View raw message