hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: some guidance needed
Date Wed, 18 May 2011 23:05:49 GMT
Hi Ioan,

I would encourage you to look at a system like HBase for your mail
backend. HDFS doesn't work well with lots of little files, and also
doesn't support random update, so existing formats like Maildir
wouldn't be a good fit.


On Wed, May 18, 2011 at 4:02 PM, Ioan Eugen Stan <stan.ieugen@gmail.com> wrote:
> Hello everybody,
> I'm a GSoC student for this year and I will be working on James [1].
> My project is to implement email storage over HDFS. I am quite new to
> Hadoop and associates and I am looking for some hints as to get
> started on the right track.
> I have installed a single node Hadoop instance on my machine and
> played around with it (ran some examples) but I am interested into
> what you (more experienced people) think it's the best way to approach
> my problem.
> I am a little puzzled about the fact that that I read hadoop is best
> used for large files and email aren't that large from what I know.
> Another thing that crossed my mind is that since HDFS is a file
> system, wouldn't it be possible to set it as a back-end for the
> (existing) maildir and mailbox storage formats? (I think this question
> is more suited on the James mailing list, but if you have some ideas
> please speak your mind).
> Also, any development resources to get me started are welcomed.
> [1] http://james.apache.org/mailbox/
> [2] https://issues.apache.org/jira/browse/MAILBOX-44
> Regards,
> --
> Ioan Eugen Stan

Todd Lipcon
Software Engineer, Cloudera

View raw message