hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <c0d3h...@gmail.com>
Subject Re: Discussion about Hamburg (provisional name) open sourcing
Date Wed, 16 Sep 2009 04:10:32 GMT
Now, hbase is necessary for the storage system of hama. However, hbase
has inherent limits about that it provides only horizontal
partitioning of matrix. For the best performance, Hama needs both
horizontal-vertical partitioning and replications (like cube). It can
be implemented by some specific file structure on HDFS. Of course,
matrix computation still will use MR.

I think that ultimately hama should go for that way. Therefore, it
would be better that hama is independent to the specific storage
system.

--
Hyunsik Choi
Database & Information Systems Group, Korea University
http://diveintodata.org



On Wed, Sep 16, 2009 at 11:26 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
> Personally, I would remove the Hbase, which only used as a
> communication module from the Hama in the future.
>
> What do you think?
>
> On Wed, Sep 16, 2009 at 11:10 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
>> Hyunsik,
>>
>> I would suggest, we are to independently library-ize BSP component so
>> that we can commonly use it for an bulk-synchronous algorithms in a
>> matrix computational package, and development of BSP based graph
>> computing framework. Then, IMO, the top-level package, roughly, could
>> be as described below:
>>
>> org.apache.hama.bsp
>> org.apache.hama.matrix
>> org.apache.hama.graph
>> org.apache.hama.examples
>>
>> Let's discuss more details in the hama-dev@ and hamburg-dev@.
>>
>> Thanks. ;)
>>
>> On Tue, Sep 15, 2009 at 6:10 PM, Hyunsik Choi <c0d3h4ck@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> As you know, graph is a very useful data model. Matrix is the great
>>> tool to store graph data and process them. Therefore, I think that
>>> they will give many benefits each other. I agree with your opinion.
>>>
>>> If we do so, how can we integrate them? What do you think about that?
>>>
>>> Best regards,
>>> --
>>> Hyunsik Choi
>>> Database & Information Systems Group, Korea University
>>> http://diveintodata.org
>>>
>>>
>>>
>>> On Tue, Sep 15, 2009 at 5:25 PM, Edward J. Yoon <edwardyoon@apache.org>
wrote:
>>>>
>>>> Thanks for interesting information.
>>>>
>>>> With hindsight, IMO, the Hama project is a best to incubate Hamburg
>>>> project. and, we can consider again when prepare to graduate from
>>>> incubator.
>>>>
>>>> On Mon, Sep 14, 2009 at 1:09 PM, Taylor, Ronald C <ronald.taylor@pnl.gov>
wrote:
>>>>> Hello Mr. Yoon,
>>>>>
>>>>> I was delighted to hear of your proposed Hamburg project. I am a new
>>>>> user of Hadoop (and Hbase). It looks like that I will be spending a
>>>>> substantial amount of time working in this environment over the next
>>>>> couple years, for both DOE bioinformatics work (my primary field) and
>>>>> for work funded by DoD. I am enthusiastic about using Hadoop, Hive,
>>>>> Hbase. Also am quite interested in the Mahout project.
>>>>>
>>>>> While I cannot offer advice as to where to place your new project within
>>>>> the Apache framework, I did want to offer my support. I believe that
it
>>>>> could well be of value in the coming years both to me, for my
>>>>> bioinformatics research, and to other researchers here at PNNL working
>>>>> in the areas of social networks (in our national security directorate)
>>>>> and in a set of projects directed toward making the electrical grid
>>>>> "smarter". I would not be able to contribute any code until I found
>>>>> funding from current or new projects for my time. But if Hamburg moves
>>>>> forward and can demonstrate its usefulness, that might be a real
>>>>> possibility.
>>>>>
>>>>> And in regards to funding for getting you some help: if you can find
a
>>>>> collaborator based at a university or non-profit, said collaborator
>>>>> could well apply for a grant from the US National Science Foundation
for
>>>>> open source Hadoop-based development of graph computing / mining
>>>>> algorithms. The NSF Computer and Information Science and Engineering
>>>>> Directorate is awarding grants specifically devoted to the area of graph
>>>>> mining (at least this year - hopefully will continue next year - anyway,
>>>>> NSF gives money for algorithm and tool development in general - friendly
>>>>> to that). I can't apply (at least not directly) - NSF does not like to
>>>>> give money to other US government labs. But I would think you could find
>>>>> someone in academia - perhaps someone already working with the Mahout
>>>>> group. It would appear a natural fit. I presume there are a number of
>>>>> people associated with the Apache org who know something about the NSF
>>>>> and could offer further advice in that direction.
>>>>>
>>>>> I look forward to hearing more about Hamburg, as it progresses.
>>>>>
>>>>>  Best,
>>>>>  Ron Taylor
>>>>>
>>>>> ___________________________________________
>>>>> Ronald Taylor, Ph.D.
>>>>> Computational Biology & Bioinformatics Group
>>>>> Pacific Northwest National Laboratory
>>>>> 902 Battelle Boulevard
>>>>> P.O. Box 999, MSIN K7-90
>>>>> Richland, WA  99352 USA
>>>>> Office:  509-372-6568
>>>>> Email: ronald.taylor@pnl.gov
>>>>> www.pnl.gov
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: edward@udanax.org [mailto:edward@udanax.org] On Behalf Of Edward
>>>>> J. Yoon
>>>>> Sent: Sunday, September 13, 2009 7:27 PM
>>>>> To: general@hadoop.apache.org; hama-dev@incubator.apache.org;
>>>>> hamburg-dev@googlegroups.com
>>>>> Cc: Paolo Castagna
>>>>> Subject: Discussion about Hamburg (provisional name) open sourcing
>>>>>
>>>>> Hello communities,
>>>>>
>>>>> I'm the one of the Hamburg (provisional name), which is the graph
>>>>> computing framework on Hadoop sponsor. Now we're working on the
>>>>> perfection of our prototype project, and we'll propose the Hamburg
>>>>> project soon.
>>>>>
>>>>> - http://wiki.apache.org/hadoop/Hamburg, a wiki page
>>>>> - http://throb.googlecode.com/, a prototype project
>>>>>
>>>>> BTW, before we decide to propose, we need time just to consider where
it
>>>>> belongs to.
>>>>>
>>>>> Since it aims to create a "general graph computing framework" on Hadoop,
>>>>> I'd like to propose it as a sub-project of Hadoop. On the other hand,
>>>>> since the matrix and graph are both in the domain of scientific
>>>>> computing and BSP model could be used for matrix computation areas, I
>>>>> think this project also can be integrated with the Hama project.
>>>>>
>>>>> WDYT? Any advices are welcome.
>>>>>
>>>>> --
>>>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>>>> edwardyoon@apache.org
>>>>> http://blog.udanax.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon @ NHN, corp.
>>>> edwardyoon@apache.org
>>>> http://blog.udanax.org
>>>>
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon @ NHN, corp.
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>

Mime
View raw message