Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6C643117FF for ; Wed, 3 Sep 2014 08:02:18 +0000 (UTC) Received: (qmail 74513 invoked by uid 500); 3 Sep 2014 08:01:58 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 74407 invoked by uid 500); 3 Sep 2014 08:01:58 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 74397 invoked by uid 99); 3 Sep 2014 08:01:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Sep 2014 08:01:58 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akumarb2010@gmail.com designates 209.85.192.48 as permitted sender) Received: from [209.85.192.48] (HELO mail-qg0-f48.google.com) (209.85.192.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Sep 2014 08:01:53 +0000 Received: by mail-qg0-f48.google.com with SMTP id z107so7826134qgd.21 for ; Wed, 03 Sep 2014 01:01:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QHCvftOeVcoIBKwO4DcxbfnS7aGnl2rrRA7hsWQmmNw=; b=UPR1bxEwzPPpMfxa+K66KJjm3+KN58igRkNzIkl4FojRovzQWK/mw6ptoAIQG/khzW Y91jFa2sd00W9YxnAH3/TDHwqQ3AsdQJgy6Sjfqg+qrup/OLVtTNk8nOPigjzff0q/t/ lptoHFng7f2P6opJz9Uz6ztiGbPUjGiEupARctln5mvQ46ee7ljajVGQP33/z5CC+oN8 iBsIq5yq8N6KrBO0GA7S/S/NW9zkNfy0p/tUcZe6LxeTLud5BaJPuRSniirVGq1+UnrU 1i/qbB62B2PthoSlH4SSfE8pXNuAI2OaKyWp+f0ARRSt/PqBwJJ13NUN6Z9O/aVU87Fo xS5g== MIME-Version: 1.0 X-Received: by 10.140.30.136 with SMTP id d8mr60851438qgd.55.1409731293143; Wed, 03 Sep 2014 01:01:33 -0700 (PDT) Received: by 10.96.159.8 with HTTP; Wed, 3 Sep 2014 01:01:33 -0700 (PDT) In-Reply-To: <1409730931.12003.10.camel@georgi-ThinkCentre-M92p> References: <1409730931.12003.10.camel@georgi-ThinkCentre-M92p> Date: Wed, 3 Sep 2014 13:31:33 +0530 Message-ID: Subject: Re: HDFS balance From: AnilKumar B To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113a921af4ebf0050224a456 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a921af4ebf0050224a456 Content-Type: text/plain; charset=UTF-8 Better to create one client/gateway node(where no DN is running) and schedule your cron from that machine. Thanks & Regards, B Anil Kumar. On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov wrote: > Hi, > We have 11 nodes cluster. > Every hour a cron job is started to upload one file( ~1GB) to Hadoop on > node1. (plain hadoop fs -put) > > This way node1 is getting full because the first replica is always > stored on the node where the command is executed. > Every day i am running re-balance, but this seems to be not enough. > The effect of this is : > host1 4.7TB/5.3TB > host[2-10] : 4.1/5.3 > > So i am always out of space on host1. > > What i can do is , spread the job to all the nodes and execute the job > on random host. > I don't really like this solution as it involves some NFS mounts, > security issues etc. > > Is there any better solution ? > > Thanks in advance. > George > > --001a113a921af4ebf0050224a456 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Better to create one client/gateway node(where no DN is ru= nning) and schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi I= vanov <ivanov@vesseltracker.com> wrote:
Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always
stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job
on random host.
I don't really like this solution as it involves some NFS mounts,
security issues etc.

Is there any better solution ?

Thanks in advance.
George


--001a113a921af4ebf0050224a456--