Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=message-id:date:from:user-agent:mime-version:to:subject:
	references:in-reply-to:content-type:content-transfer-encoding;
	b=KZ+2LsywT8zCRfb+NKxAaSZ4LffPb1Rom3Lc4FjDl0PRtVAXh6nHu8zXWWQ55v6F
Message-ID: <4AC2AA2F.9020207@yahoo-inc.com>
Date: Tue, 29 Sep 2009 17:45:35 -0700
From: Jakob Homan <jhoman@yahoo-inc.com>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: common-dev@hadoop.apache.org
Subject: Re: Developing Hadoop and HDFS
References: <19ebb2640909291739g531cab9ejaf7d81ffc8c8c8a7@mail.gmail.com>
In-Reply-To: <19ebb2640909291739g531cab9ejaf7d81ffc8c8c8a7@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Thanks for your interest, Geoff.  Yes, finding open JIRAS and 
contributing patches is very helpful.  We also maintain a wishlist of 
projects that one could work on: 
http://wiki.apache.org/hadoop/ProjectSuggestions.  In addition, please 
do consider documentation and example work as well, as this is very 
helpful both to new users and developers starting on the project.

Thanks,
Jakob
Hadoop at Yahoo!

Geoffrey Gallaway wrote:
> Hello,
> 
> Yes, another person looking to contribute to and develop Hadoop. I'm looking
> to start off small, fixing a few bugs before moving into larger stuff.
> 
> First, a bit of background:
> Years ago I had the idea of creating a semi-decentralized distributed file
> system. The idea came when I was working for a small/medium sized company
> who was looking for a simple backup solution for their workstations. PC's
> back then came with 100+ GB hard drives but, as simple workstations,
> employees were using less than half that space. Why not have each
> workstation backup to a few other workstations, duplicating files across
> multiple machines for redundancy. RAID for the network. I started coming up
> with design and architecture specs, protocol examples and even started
> writing a bit of the system (in Java). I tried to find a few interested
> developers but everyone seemed to think the task was much too large to be
> accomplished as a side project (and I didn't think, given the IT industry of
> the time, that anyone would fund it). Later, I realized such a distributed
> system could be much more than a simple file backup solution.
> 
> It looks like Hadoop and HDFS are creating a lot of what I had wanted to
> create, it's already surpassed what I had in mind in most ways.
> 
> So, where should I start? Just start fixing bugs listed in JIRA?
> 
> Geoff
>