Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 789AF10270 for ; Sun, 29 Sep 2013 09:10:05 +0000 (UTC) Received: (qmail 91139 invoked by uid 500); 29 Sep 2013 09:09:42 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 90712 invoked by uid 500); 29 Sep 2013 09:09:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 90705 invoked by uid 99); 29 Sep 2013 09:09:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Sep 2013 09:09:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jens.scheidtmann@gmail.com designates 209.85.219.53 as permitted sender) Received: from [209.85.219.53] (HELO mail-oa0-f53.google.com) (209.85.219.53) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Sep 2013 09:09:33 +0000 Received: by mail-oa0-f53.google.com with SMTP id i7so3120244oag.40 for ; Sun, 29 Sep 2013 02:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Wq9Z4LpNwEvsOAT4btv+LZi+LYw/5I2ex4UTYkD6zsg=; b=k6VV3GunI7VQLzRm6i8RPPmuMjuUsp8TfzZd83ayTaJcierbmP9AxXCmDLJ8RcCdnr 91iEyRu+AE/IXmJnNAzNiIc0G2e1R7CcPmLeuQCrDF4UI/Ou72AlKldRtMG1BoI5OJTq JGgJj5jm3GJ4iSujwkRb0phO3NpDQSFUoWrknDX5j6Zuu1IV1vA/EbmI6JVXqEyKUFp0 qqRa35pvtRIqAy07EOat2klc3fBUAYuwmUpLGKBDrEh/QF8vTLRQSBzHHtLbW/Nc+wfp M4ux8HgeC/4ZkZRy45QGaO6mbKrZPqIXZewM2EigTB8LcpnRhqgaEkl64Qvur2rNm/01 MIbg== MIME-Version: 1.0 X-Received: by 10.60.117.225 with SMTP id kh1mr14297036oeb.15.1380445752352; Sun, 29 Sep 2013 02:09:12 -0700 (PDT) Received: by 10.60.118.129 with HTTP; Sun, 29 Sep 2013 02:09:12 -0700 (PDT) In-Reply-To: References: <1380138830.85455.YahooMailNeo@web141202.mail.bf1.yahoo.com> Date: Sun, 29 Sep 2013 11:09:12 +0200 Message-ID: Subject: Re: Is there any way to partially process HDFS edits? From: Jens Scheidtmann To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b3a9a7cb366c604e78212d2 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a9a7cb366c604e78212d2 Content-Type: text/plain; charset=ISO-8859-1 Tom, I would file a jira, if I were you and my Hadoop Version was recent enough. Should be pretty easy to reproduce. Jens Am Donnerstag, 26. September 2013 schrieb Tom Brown : > They were created and deleted in quick succession. I thought that meant > the edits for both the create and delete would be logically next to each > other in the file allowing it to release the memory almost as soon as it > had been allocated. > > In any case, after finding a VM host that could give me more RAM, I was > able to get the namenode started. The process used 25GB at it's peak. > > Thanks for your help! > > > On Thu, Sep 26, 2013 at 11:07 AM, Harsh J wrote: > > Tom, > > That is valuable info. When we "replay" edits, we would be creating > and then deleting those files - so memory would grow in between until > the delete events begin appearing in the edit log segment. > > On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown wrote: > > A simple estimate puts the total number of blocks somewhere around > 500,000. > > Due to an HBase bug (HBASE-9648), there were approximately 50,000,000 > files > > that were created and quickly deleted (about 10/sec for 6 weeks) in the > > cluster, and that activity is what is contained in the edits. > > > > Since those files don't exist (quickly created and deleted), shouldn't > they > > be inconsequential to the memory requirements of the namenode as it > starts > > up. > > > > --Tom > > > > > > On Thu, Sep 26, 2013 at 10:25 AM, Nitin Pawar > > wrote: > >> > >> Can you share how many blocks does your cluster have? how many > >> directories? how many files? > >> > >> There is a JIRA https://issues.apache.org/jira/browse/HADOOP-1687 which > >> explains how much RAM will be used for your namenode. > >> Its pretty old by hadoop version but its a good starting point. > >> > >> According to Cloudera's blog "A good rule of thumb is to assume 1GB of > >> NameNode memory for every 1 million blocks stored in the distributed > file > >> system" > >> > >> > http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ > >> > >> > >> > >> On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown > wrote: > >>> > >>> It ran again for about 15 hours before dying again. I'm seeing what > extra > >>> RAM resources we can throw at this VM (maybe up to 32GB), but until > then I'm > >>> trying to figure out if I'm hitting some strange bug. > >>> > >>> When the edits were originally made (over the course of 6 weeks), the > >>> namenode only had 512MB and was able to contain the filesystem > completely in > >>> memory. I don't understand why it's running out of memory. If 512MB was > >>> enough while the edits were first made, shouldn't it be enough to > process > >>> them again? > >>> > >>> --Tom > >>> > >>> > >>> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J wrote: > >>>> > >>>> Hi Tom, > >>>> > >>>> The edits are processed sequentially, and aren't all held in memory. > >>>> Right now there's no mid-way-checkpoint when it is loaded, such that > >>>> it could resume only with remaining work if interrupted. Normally this > >>>> is not a problem in deployments given that SNN or SBN runs for > >>>> checkpointing the images and keeping the edits collection small > >>>> periodically. > >>>> > >>>> If your NameNode is running out of memory _applying_ the edits, then > >>>> the cause is not the edits but a growing namespace. You most-likely > >>>> have more files now than before, and thats going to take up permanent > >>>> memory from the NameNode heap size. > >>>> > >>>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown > wrote: > >>>> > Unfortunately, I cannot give it that much RAM. The machine has 4GB > >>>> > total > >>>> > (though could be expanded somewhat-- it's a VM). > >>>> > > >>>> > Though if each edit is processed sequentially (in a > > --047d7b3a9a7cb366c604e78212d2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Tom,

I=A0would file a jira, if I were you a= nd my Hadoop Version was=A0recent enough.=A0=A0Should be pretty easy to rep= roduce.

Jens

Am Donnerstag, 26. September 2013 sc= hrieb Tom Brown :
They were created and delet= ed in quick succession. I thought that meant the edits for both the create = and delete would be logically next to each other in the file allowing it to= release the memory almost as soon as it had been allocated.

In any case, after finding a VM host that could give me more= RAM, I was able to get the namenode started. The process used 25GB at it&#= 39;s peak.

Thanks for your help!


On Thu, Sep 26, 2013 at 11:07 AM, Harsh J &l= t;harsh@cloudera.com> wrote:
Tom,

That is valuable info. When we "replay" edits, we would be creati= ng
and then deleting those files - so memory would grow in between until
the delete events begin appearing in the edit log segment.

On Thu, Sep 26, 2013 at 10:07 PM, Tom Brown <tombrown52@gmail.com= > wrote:
> A simple estimate puts the total number of blocks somewhere around 500= ,000.
> Due to an HBase bug (HBASE-9648), there were approximately 50,000,000 = files
> that were created and quickly deleted (about 10/sec for 6 weeks) in th= e
> cluster, and that activity is what is contained in the edits.
>
> Since those files don't exist (quickly created and deleted), shoul= dn't they
> be inconsequential to the memory requirements of the namenode as it st= arts
> up.
>
> --Tom
>
>
> On Thu, Sep 26, 2013 at 10:25 AM, Nitin Pawar <nitinpawar432@gma= il.com>
> wrote:
>>
>> Can you share how many blocks does your cluster have? how many
>> directories? how many files?
>>
>> There is a JIRA https://issues.apache.org/jira/browse/HADOOP-= 1687 which
>> explains how much RAM will be used for your namenode.
>> Its pretty old by hadoop version but its a good starting point. >>
>> According to Cloudera's blog "A good rule of thumb is to = assume 1GB of
>> NameNode memory for every 1 million blocks stored in the distribut= ed file
>> system"
>>
>> http://blog= .cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-ha= doop-cluster/
>>
>>
>>
>> On Thu, Sep 26, 2013 at 9:26 PM, Tom Brown <tombrown52@gmail= .com> wrote:
>>>
>>> It ran again for about 15 hours before dying again. I'm se= eing what extra
>>> RAM resources we can throw at this VM (maybe up to 32GB), but = until then I'm
>>> trying to figure out if I'm hitting some strange bug.
>>>
>>> When the edits were originally made (over the course of 6 week= s), the
>>> namenode only had 512MB and was able to contain the filesystem= completely in
>>> memory. I don't understand why it's running out of mem= ory. If 512MB was
>>> enough while the edits were first made, shouldn't it be en= ough to process
>>> them again?
>>>
>>> --Tom
>>>
>>>
>>> On Thu, Sep 26, 2013 at 6:05 AM, Harsh J <harsh@cloudera= .com> wrote:
>>>>
>>>> Hi Tom,
>>>>
>>>> The edits are processed sequentially, and aren't all h= eld in memory.
>>>> Right now there's no mid-way-checkpoint when it is loa= ded, such that
>>>> it could resume only with remaining work if interrupted. N= ormally this
>>>> is not a problem in deployments given that SNN or SBN runs= for
>>>> checkpointing the images and keeping the edits collection = small
>>>> periodically.
>>>>
>>>> If your NameNode is running out of memory _applying_ the e= dits, then
>>>> the cause is not the edits but a growing namespace. You mo= st-likely
>>>> have more files now than before, and thats going to take u= p permanent
>>>> memory from the NameNode heap size.
>>>>
>>>> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown <tombrown= 52@gmail.com> wrote:
>>>> > Unfortunately, I cannot give it that much RAM. The ma= chine has 4GB
>>>> > total
>>>> > (though could be expanded somewhat-- it's a VM).<= br> >>>> >
>>>> > Though if each edit is processed sequentially (in a
--047d7b3a9a7cb366c604e78212d2--