Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B4C0F200C6F for ; Mon, 24 Apr 2017 21:13:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B38F4160B93; Mon, 24 Apr 2017 19:13:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D4CD1160B99 for ; Mon, 24 Apr 2017 21:13:23 +0200 (CEST) Received: (qmail 95283 invoked by uid 500); 24 Apr 2017 19:13:22 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 95080 invoked by uid 99); 24 Apr 2017 19:13:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Apr 2017 19:13:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6D6E51A0171 for ; Mon, 24 Apr 2017 19:13:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JkKn-6krXwBI for ; Mon, 24 Apr 2017 19:13:21 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id DB03E624B4 for ; Mon, 24 Apr 2017 19:13:14 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A2224E0DA4 for ; Mon, 24 Apr 2017 19:13:13 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6E60D21B5B for ; Mon, 24 Apr 2017 19:13:11 +0000 (UTC) Date: Mon, 24 Apr 2017 19:13:11 +0000 (UTC) From: "Anu Engineer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Issue Comment Deleted] (HDFS-7240) Object store in HDFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 24 Apr 2017 19:13:24 -0000 [ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-7240: ------------------------------- Comment: was deleted (was: [~cheersyang] Thanks for the review comments. This update addresses most of the comments. Details Inline ... bq. getCommand() seems to always return DEFAULT_LIST. Thanks for catching that, fixed. bq. Looks like Commands#getCommands() evicts the commands for a datanode by setting this.commands to a new list. Why not to let Commands#getCommands() return Commands and modify the code to be something like, Did exactly that, thanks for the suggestion bq. I don't understand the point of maintaining commandsInQueue, if you want to count the number of commands in the queue, why not count the number of commands in commandMap? That is exactly what we are doing, we have a Map with a list of commands. So each time we add a command we just keep a running counter. bq. The timeout argument in HadoopExecutors#shutdown() doesn't seem to work as expected. Imagine passing 5s timeout to shutdown method, and line 117 waits 5s but could not get tasks gracefully shutdown, then in line 121 it will wait for another 5s. Correct, This is an issue with java threading model. On one hand, user says wait for 5 seconds. He/She is expressing an intent to wait for that much time to finish any running tasks that have entered the queue just before we executed the shutdown. That is the gracefully shutdown part, however the threads have no obligations to finish executing in this time frame(Depends on how the task was written). So if we got a timeout, we now call ShutdownNow which means that we are actively trying to stop executing tasks, but even that is nothing but a best-effort. If we fail, we log an error. The other option is to divide the time specified by the user by 2 and wait only that much. But then user expresses an intent that is 5 seconds and we really only wait 2.5 seconds, where as the task might have really finished in 3 seconds. Bottom Line, Java threading model is not very user friendly, and Thread.stop() was not even correct. bq. Can we move these configuration properties to ScmConfigKeys? Done bq. line 144 seems unnecessary to call toString() as there is a constructor Well, you are right. It was a lame attempt by me to remind the reader of the code {{getHostAddress}} returns a string and not an InetAddress. Evidently, I have failed. I have removed the toString and left getHostAddress as is. Just for the record, we still pass String and port to the InetAddress. bq. Is ContainerReplicationManager better to extend AbstractService? Currently it starts the thread in the constructor, it's better to handle this in its own serviceStart and serviceStop, what do you think? Not sure if that buys us anything. This patch does not show that yet, but once all the parts of the container replication is in, this will invoked from the SCM. SCM is already a service, this is just a layer that will called on my SCM to do a job. bq. startReconciliation() iterates each datanode in a pool on getNodestate(). getNodestate(), however, would wait and retry for at most 100s, this will let rest of datanodes wait. Can we make this parallel? We could, but I would think that we should do that after we profile these code paths. It is quite possible that bottlenecks in this code is not in the start processing part, but where we handle the container reports. So unless we really see that getNodeState is taking too long, I would suggest that we assume that getNodeState is instaneous in most cases. bq. There seems to be a lot of variables such as nodeProcessed, containerProcessedCount, nodeCount are not thread safe. Thanks, Fixed. ) > Object store in HDFS > -------------------- > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org