hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 19:52:55 GMT
Again ... User can create their own InputFormatter to read records as
a <Writable, ArrayWritable> from text file or sequence file, or
NoSQLs.

You can use K, V pairs and sequence file. Why do you want to use text
file? Should I always write text file and parse them using
VertexInputReader?


On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
<thomas.jungblut@gmail.com> wrote:
>>
>> It's a gap in experience, Thomas.
>
>
> Most probably you should read some good books on data extraction and then
> choose your tools accordingly.
> I never think that BSP is and will be a good extraction technique for
> unstructured data.
>
> But these are just my two cents here- there seems to be somewhat more
> political problems in this game than using tools appropriately.
>
> 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>
>> Yes, if you preprocess your data correctly.
>> I have done the same unstructured extraction with the movie database from
>> IMDB and it worked fine.
>> That's just not a job for BSP, but for MapReduce.
>>
>> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>>
>>> It's a gap in experience, Thomas. Do you think you can extract Twitter
>>>
>>> mention graph using parseVertex?
>>>
>>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
>>> <thomas.jungblut@gmail.com> wrote:
>>> > I have trouble understanding you here.
>>> >
>>> > How can I generate large sample without coding?
>>> >
>>> >
>>> > Do you mean random data generation or real-life data?
>>> > Personally I think it is really convenient to transform unstructured
>>> data
>>> > in a text file to vertices.
>>> >
>>> >
>>> > 2012/12/10 Edward <edward@udanax.org>
>>> >
>>> >> I mean, With or without input reader. How can I generate large sample
>>> >> without coding?
>>> >>
>>> >> It's unnecessary feature. As I mentioned before, only good for simple
>>> and
>>> >> small test.
>>> >>
>>> >> Sent from my iPhone
>>> >>
>>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
>>> thomas.jungblut@gmail.com>
>>> >> wrote:
>>> >>
>>> >> >>
>>> >> >> In my case, generating test data is very annoying.
>>> >> >
>>> >> >
>>> >> > Really? What is so difficult to generate tab separated text data?;)
>>> >> > I think we shouldn't do this, but there seems to be very little
>>> interest
>>> >> in
>>> >> > the community so I will not block your work on it.
>>> >> >
>>> >> > Good luck ;)
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>>
>>
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message