Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EB1236F02 for ; Fri, 24 Jun 2011 07:31:23 +0000 (UTC) Received: (qmail 12079 invoked by uid 500); 24 Jun 2011 07:31:23 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 10515 invoked by uid 500); 24 Jun 2011 07:31:14 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 10478 invoked by uid 99); 24 Jun 2011 07:31:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jun 2011 07:31:09 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jun 2011 07:31:07 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8D39942C70C for ; Fri, 24 Jun 2011 07:30:47 +0000 (UTC) Date: Fri, 24 Jun 2011 07:30:47 +0000 (UTC) From: "Aaron T. Myers (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1876482817.35923.1308900647575.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1271780704.23333.1308632387413.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054285#comment-13054285 ] Aaron T. Myers commented on HDFS-2092: -------------------------------------- bq. We are not concerned about the task attempt. The problem here is for Task Tracker's availability. Have you actually experienced TTs crashing because conf objects were too large? Or where conf objects were taking up a substantial portion of the available heap space? bq. The way conf was designed has its own benefits. At the same time it comes with some disadvantages. What if a task attempt can run for a day or more? This is not uncommon in, our clusters. I would conjecture that such a task attempt is likely using many MBs or GBs of memory for the actual work it's doing. Is this patch which saves a few hundred KBs at the extreme end really going to move the needle? bq. 1. With UGI, conf will be created per user in TT. (Security folks?) But presumably only for every user which is concurrently running a task attempt on that TT, so not that many, right? Unless I'm missing something, which is certainly possible. bq. 2. PIG or any other job can store arbitrary data. Hadoop framework should be able to deal with it as far as it can. No disagreement there. bq. 3. Last but not least, API should not hold on to client's data. I see no principled reason the DFSClient "should not hold on to client's data" in the form of the conf object. If this is actually negatively impacting performance or availability, then we should certainly fix that, but you haven't demonstrated that yet. bq. As every job is different so can workloads can be different. So one can't see or hear all the problems. Certainly, but we can validate this issue with some testing. Can you please describe what you did to gather these measurements? What exactly are they actually measuring? My issue here is that this change is being done purely as an optimization, but it's unclear to me that negative issues exist without this patch, or that this patch necessarily addresses those issues. If you can demonstrate those, I'll shut up immediately. :) > Create a light inner conf class in DFSClient > -------------------------------------------- > > Key: HDFS-2092 > URL: https://issues.apache.org/jira/browse/HDFS-2092 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client > Affects Versions: 0.23.0 > Reporter: Bharath Mundlapudi > Assignee: Bharath Mundlapudi > Fix For: 0.23.0 > > Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch > > > At present, DFSClient stores reference to configuration object. Since, these configuration objects are pretty big at times can blot the processes which has multiple DFSClient objects like in TaskTracker. This is an attempt to remove the reference of conf object in DFSClient. > This patch creates a light inner conf class and copies the required keys from the Configuration object. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira