hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xiaofei du <xiaofei.du...@gmail.com>
Subject Re: what kind of improvement for HDFS could possibly be done within 3 months
Date Thu, 30 Sep 2010 15:35:37 GMT
Hi Thomas,

Thank you very much. You are very kind. I will try to find out what these
projects are doing.

Thanks again for your kindness!

On Thu, Sep 30, 2010 at 4:18 PM, Thomas Koch <thomas@koch.ro> wrote:

> xiaofei du:
> > Hi All,
> >
> > I am a graduate student, I am preparing for my diploma project. I have
> > about 3 months to finish the project. I want to do some work on HDFS.
> > However, I have no concept what I could do for improving HDFS. So could
> you
> > guys please give me some suggestions?
> >
> > I hope the suggested project could be done within 3 months, I cannot
> afford
> > more time. So the project should not be too hard (at the time, it should
> > not be easy, otherwise, I cannot reach the graduation requirement :-) )
> >
> > thank you !!!
> Hi Xiaofei,
>
> I've three other suggestions:
>
> - Yesterday I got hit by MAPREDUCE-1283[1]. This issue by itself is of
> course
> not enough for three months, but my idea is that you could in general have
> a
> look over the developer tools and what's missing, what needs improvement.
>
> - HBasene[2] is a project to store a lucene index natively on BigTable. It
> was
> inspired by Lucandra[3]. The HBasene project however has stalled. Still it
> would be a very promissing project IMHO especially considering the upcoming
> talk on Googles new Index infrastructure Percolator[4] that uses BigTable
> to
> store the index.
>
> - A backup system on top of HBase. It should try to store similar files
> near
> to each other so that tablet compression can work best. HBase's timestamps
> could be used to hold several versions of a file and let HBase handle the
> expiration of old versions of files.
> As an additional task you could evaluate the feasability of installing
> HBase
> as a backup system with office desktop computers as regionservers. This
> could
> utilize otherwise unused hard drive space.
>
> [1] https://issues.apache.org/jira/browse/MAPREDUCE-1283
> [2] http://github.com/akkumar/hbasene
> [3] http://github.com/tjake/Lucandra
> [4] http://www.theregister.co.uk/2010/09/24/google_percolator/
>
> Hope, I could help,
>
> Thomas Koch, http://www.koch.ro
>



-- 
Thanks
Best wishes,
Xiaofei Du(Gregory)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message