Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF742DB0D for ; Sat, 10 Nov 2012 09:27:15 +0000 (UTC) Received: (qmail 47763 invoked by uid 500); 10 Nov 2012 09:27:15 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 47543 invoked by uid 500); 10 Nov 2012 09:27:14 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 47449 invoked by uid 99); 10 Nov 2012 09:27:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Nov 2012 09:27:12 +0000 Date: Sat, 10 Nov 2012 09:27:12 +0000 (UTC) From: "LiuLei (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1799115085.95802.1352539632645.JavaMail.jiratomcat@arcas> In-Reply-To: <1825909280.2941.1337148737333.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3429?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13494= 594#comment-13494594 ]=20 LiuLei commented on HDFS-3429: ------------------------------ I say my understand for this problem, there are two purposes the DN need to= read checksum form meta file. 1. Server need to verify checksum, example Block scanner. 2. DFSClient need to verify checksum, in te case, DN read checksum but don'= t verify checnk, instead , DN send checksum to DFSClient, DFSClient verify= checksum. So we need to two parameters to indicate the two purposes. 1. Constructor of BlockSender class has contained one verifyChecksum parame= ter, that can represent Server whether verify checksum. 2. FileSystem.setVerifyChecksum(boolean verifyChecksum) method can represen= t DFSClient whether verify checksum, so we need to send the parameter value= to DN, and add one isClientVerifyChecksum parameter in BlockSender constru= ctor=E3=80=82 If verifyChecksum and isClientVerifyChecksum parameters all are false, DN d= on't need to read checksum, and only need to send data to client, in the ca= se, we only need to create one DataChecksum.CHECKSUM_NULL instance, the ins= tance can guarantee DN don't read checksum form meta file=EF=BC=88because t= he checksumSize of the DataChecksum.CHECKSUM_NULL instance is 0=EF=BC=89. The patch I commit contain these modifies.=20 =20 =20 =20 > DataNode reads checksums even if client does not need them > ---------------------------------------------------------- > > Key: HDFS-3429 > URL: https://issues.apache.org/jira/browse/HDFS-3429 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, performance > Affects Versions: 2.0.0-alpha > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-3429-0.20.2.patch, hdfs-3429.txt, hdfs-3429.txt > > > Currently, even if the client does not want to verify checksums, the data= node reads them anyway and sends them over the wire. This means that perfor= mance improvements like HBase's application-level checksums don't have much= benefit when reading through the datanode, since the DN is still causing s= eeks into the checksum file. > (Credit goes to Dhruba for discovering this - filing on his behalf) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira