Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60A70DD9D for ; Sat, 13 Oct 2012 03:46:42 +0000 (UTC) Received: (qmail 6316 invoked by uid 500); 13 Oct 2012 03:46:36 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 6094 invoked by uid 500); 13 Oct 2012 03:46:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 6053 invoked by uid 99); 13 Oct 2012 03:46:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Oct 2012 03:46:32 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [98.138.90.92] (HELO nm29.bullet.mail.ne1.yahoo.com) (98.138.90.92) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Oct 2012 03:46:24 +0000 Received: from [98.138.226.176] by nm29.bullet.mail.ne1.yahoo.com with NNFMP; 13 Oct 2012 03:46:03 -0000 Received: from [98.138.88.235] by tm11.bullet.mail.ne1.yahoo.com with NNFMP; 13 Oct 2012 03:46:02 -0000 Received: from [127.0.0.1] by omp1035.mail.ne1.yahoo.com with NNFMP; 13 Oct 2012 03:46:02 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 883601.21734.bm@omp1035.mail.ne1.yahoo.com Received: (qmail 80839 invoked by uid 60001); 13 Oct 2012 03:46:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ymail.com; s=s1024; t=1350099962; bh=RtslhuykJHOL17DHJDGpvjziCYB26ORKX9fjOOz8L/o=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=SqtX1m5Bl3fYajYfG39KIm9BKMfma+qMq6UQ9sfryWvP6Rzz71As1AeWBE4NzNjfij3O6uKIOfBrhIjp8lsfpUNuZVzZ+RB6q8WHspaK/1gya5HKH7gvRbH2BYz71bz/TPajrdaGzeKsUk/QDLLjs48NjeuuatDfAYB2lCUl3aE= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=ymail.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=xH4TNEb5S8m5OGLdU4Cb5MauEjhW3gDg8WqIDFlabXQCZQ+6Gb2XnFyjt4HOsZFgvzkeubM2KUksvg0CQPvd4IoszpOokqMpIQJPoPIG1QbNVZ/vJgygOSgkUPlKlBGGOtojWMuqBnX8101nGYSdAcfTkM+qUudud7tPNFgWVEE=; X-YMail-OSG: UdLubgsVM1mPQS8UbOrqxl81mUkV8DaRfM8bhbZQXfypMl. dJyzvd5DEWMQdkxz1xsAd1fSJHreJ1gkbswERdpZKBy9awt1_KF1pVoLiWeT nM9hod2.KcIdIY3XUKWlAHTXF8ywKIjrpZGDpyjTqFRsNxXm7XKHlk5BJvKc 2oNL8m03H1QYRMiEc2ERWY4B1O7nq.tETtC_S4p_DneN0rUHsQmIMapxB4Z9 1wFgsm9Us6R0fooXgWGvPq8qZiJwxhsWydmip5LatylH2V9viNZyW3mmNGfc 0tB4m3bwq9m6m7jAWGN13TVaKRkQVaSJ8nXHyrjD35Y8Nl9pQW9vo1pkUulk 9fuJ_SIeFsEuCRlsztnIDRTtg.60ACXMz7s0a7_eRonsAYuH6pvDZDKsNMnc WfhEpTP4n2khgY2Bz7VBxpr4eEwC5U_5ZVlvF9YQg4UQc5qzsEENbZnU- Received: from [98.228.56.151] by web120801.mail.ne1.yahoo.com via HTTP; Fri, 12 Oct 2012 20:46:02 PDT X-Rocket-MIMEInfo: 001.001,TWF5YmUgYXQgYSBzbGlnaHQgdGFuZ2VudCwgYnV0IGZvciBlYWNoIHdyaXRlIG9wZXJhdGlvbiBvbiBIREZTIChlLmcuIGNyZWF0ZSBhIGZpbGUsIGRlbGV0ZSBhIGZpbGUsIGNyZWF0ZSBhIGRpcmVjdG9yeSksIHRoZSBOTiB3YWl0cyB1bnRpbCB0aGUgZWRpdCBoYXMgYmVlbiAqZmx1c2hlZCogdG8gZGlzay4gU28gSSBjYW4gaW1hZ2luZSBzdWNoIGEgaHlwb3RoZXRpY2FsKD8pIGRpc2sgd291bGQgdHJlbWVuZG91c2x5IHNwZWVkIHVwIHRoZSBOTiBldmVuIGFzIGl0IGlzLiBNYXJrLCBjYW4geW91IHBsZWEBMAEBAQE- X-Mailer: YahooMailWebService/0.8.123.450 References: <90F53219-05C2-4ED8-A3B6-468E21E68ED8@gmail.com> Message-ID: <1350099962.66213.YahooMailNeo@web120801.mail.ne1.yahoo.com> Date: Fri, 12 Oct 2012 20:46:02 -0700 (PDT) From: Ravi Prakash Reply-To: Ravi Prakash Subject: Re: Using a hard drive instead of To: "user@hadoop.apache.org" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="973390219-560205681-1350099962=:66213" X-Virus-Checked: Checked by ClamAV on apache.org --973390219-560205681-1350099962=:66213 Content-Type: text/plain; charset=us-ascii Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking. @memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before? ________________________________ From: Lance Norskog To: user@hadoop.apache.org Sent: Friday, October 12, 2012 12:01 AM Subject: Re: Using a hard drive instead of This is why memory-mapped files were invented. On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma wrote: > If you don't mind sharing, what hard drive do you have with these > properties: > -"performance of RAM" > -"can accommodate very many threads" > > > On Oct 11, 2012, at 21:27, Mark Kerzner wrote: > > Harsh, > > I agree with you about many small files, and I was giving this only in way > of example. However, the hard drive I am talking about can be 1-2 TB in > size, and that's pretty good, you can't easily get that much memory. In > addition, it would be more resistant to power failures than RAM. And yes, it > has the performance of RAM, and can accommodate very many threads. > > Mark > > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J wrote: >> >> Hi Mark, >> >> Note that the NameNode does random memory access to serve back any >> information or mutate request you send to it, and that there can be >> several number of concurrent clients. So do you mean a 'very fast hard >> drive' thats faster than the RAM for random access itself? The >> NameNode does persist its block information onto disk for various >> purposes, but to actually make the NameNode use disk storage >> completely (and not specific parts of it disk-cached instead) wouldn't >> make too much sense to me. That'd feel like trying to communicate with >> a process thats swapping, performance-wise. >> >> The too many files issue is bloated up to sound like its a NameNode >> issue but it isn't in reality. HDFS allows you to process lots of >> files really fast, aside of helping store them for long periods, and a >> lot of tiny files only gets you down in such operations with overheads >> of opening and closing files in the way of reading them all at a time. >> With a single or a few large files, all you do is block (data) reads, >> and very few NameNode communications - ending up going much faster. >> This is the same for local filesystems as well, but not many think of >> that. >> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner >> wrote: >> > Hi, >> > >> > Imagine I have a very fast hard drive that I want to use for the >> > NameNode. >> > That is, I want the NameNode to store its blocks information on this >> > hard >> > drive instead of in memory. >> > >> > Why would I do it? Scalability (no federation needed), many files are >> > not a >> > problem, and warm fail-over is automatic. What would I need to change in >> > the >> > NameNode to tell it to use the hard drive? >> > >> > Thank you, >> > Mark >> >> >> >> -- >> Harsh J > > -- Lance Norskog goksron@gmail.com --973390219-560205681-1350099962=:66213 Content-Type: text/html; charset=us-ascii
Maybe at a slight tangent, but for each write operation on HDFS (e.g. create a file, delete a file, create a directory), the NN waits until the edit has been *flushed* to disk. So I can imagine such a hypothetical(?) disk would tremendously speed up the NN even as it is. Mark, can you please please please send me 5 of these disks? :-P
To answer your question, you probably want to change BlockManager and FSNamesystem, both basically being the crux of HDFS NN. Its going to be a pretty significant undertaking.
@memory-mapped files would lose data in case of failure (unless ofcourse you use special hardware, thinking of which, really its not soooo special, so maybe worth trying). Has anyone tried this before?


From: Lance Norskog <goksron@gmail.com>
To: user@hadoop.apache.org
Sent: Friday, October 12, 2012 12:01 AM
Subject: Re: Using a hard drive instead of

This is why memory-mapped files were invented.

On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma
<gaurav.gs.sharma@gmail.com> wrote:
> If you don't mind sharing, what hard drive do you have with these
> properties:
> -"performance of RAM"
> -"can accommodate very many threads"
>
>
> On Oct 11, 2012, at 21:27, Mark Kerzner <mark.kerzner@shmsoft.com> wrote:
>
> Harsh,
>
> I agree with you about many small files, and I was giving this only in way
> of example. However, the hard drive I am talking about can be 1-2 TB in
> size, and that's pretty good, you can't easily get that much memory. In
> addition, it would be more resistant to power failures than RAM. And yes, it
> has the performance of RAM, and can accommodate very many threads.
>
> Mark
>
> On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi Mark,
>>
>> Note that the NameNode does random memory access to serve back any
>> information or mutate request you send to it, and that there can be
>> several number of concurrent clients. So do you mean a 'very fast hard
>> drive' thats faster than the RAM for random access itself? The
>> NameNode does persist its block information onto disk for various
>> purposes, but to actually make the NameNode use disk storage
>> completely (and not specific parts of it disk-cached instead) wouldn't
>> make too much sense to me. That'd feel like trying to communicate with
>> a process thats swapping, performance-wise.
>>
>> The too many files issue is bloated up to sound like its a NameNode
>> issue but it isn't in reality. HDFS allows you to process lots of
>> files really fast, aside of helping store them for long periods, and a
>> lot of tiny files only gets you down in such operations with overheads
>> of opening and closing files in the way of reading them all at a time.
>> With a single or a few large files, all you do is block (data) reads,
>> and very few NameNode communications - ending up going much faster.
>> This is the same for local filesystems as well, but not many think of
>> that.
>>
>> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerzner@shmsoft.com>
>> wrote:
>> > Hi,
>> >
>> > Imagine I have a very fast hard drive that I want to use for the
>> > NameNode.
>> > That is, I want the NameNode to store its blocks information on this
>> > hard
>> > drive instead of in memory.
>> >
>> > Why would I do it? Scalability (no federation needed), many files are
>> > not a
>> > problem, and warm fail-over is automatic. What would I need to change in
>> > the
>> > NameNode to tell it to use the hard drive?
>> >
>> > Thank you,
>> > Mark
>>
>>
>>
>> --
>> Harsh J
>
>



--
Lance Norskog
goksron@gmail.com


--973390219-560205681-1350099962=:66213--