hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 19:56:18 GMT
You didn't get the use of the reader.
The reader doesn't care about the input format.
It just takes the input as Writable, so for Text this is LongWritable/Text
pairs. For NoSQL this might be LongWritable/BytesWritable.

It's up to you coding this for your input sequence, not for each format.
This is not hardcoded to text, only in the examples.

2012/12/10 Edward J. Yoon <edwardyoon@apache.org>

> Again ... User can create their own InputFormatter to read records as
> a <Writable, ArrayWritable> from text file or sequence file, or
> NoSQLs.
>
> You can use K, V pairs and sequence file. Why do you want to use text
> file? Should I always write text file and parse them using
> VertexInputReader?
>
>
> On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
> >>
> >> It's a gap in experience, Thomas.
> >
> >
> > Most probably you should read some good books on data extraction and then
> > choose your tools accordingly.
> > I never think that BSP is and will be a good extraction technique for
> > unstructured data.
> >
> > But these are just my two cents here- there seems to be somewhat more
> > political problems in this game than using tools appropriately.
> >
> > 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >
> >> Yes, if you preprocess your data correctly.
> >> I have done the same unstructured extraction with the movie database
> from
> >> IMDB and it worked fine.
> >> That's just not a job for BSP, but for MapReduce.
> >>
> >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >>
> >>> It's a gap in experience, Thomas. Do you think you can extract Twitter
> >>>
> >>> mention graph using parseVertex?
> >>>
> >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
> >>> <thomas.jungblut@gmail.com> wrote:
> >>> > I have trouble understanding you here.
> >>> >
> >>> > How can I generate large sample without coding?
> >>> >
> >>> >
> >>> > Do you mean random data generation or real-life data?
> >>> > Personally I think it is really convenient to transform unstructured
> >>> data
> >>> > in a text file to vertices.
> >>> >
> >>> >
> >>> > 2012/12/10 Edward <edward@udanax.org>
> >>> >
> >>> >> I mean, With or without input reader. How can I generate large
> sample
> >>> >> without coding?
> >>> >>
> >>> >> It's unnecessary feature. As I mentioned before, only good for
> simple
> >>> and
> >>> >> small test.
> >>> >>
> >>> >> Sent from my iPhone
> >>> >>
> >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
> >>> thomas.jungblut@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >> >>
> >>> >> >> In my case, generating test data is very annoying.
> >>> >> >
> >>> >> >
> >>> >> > Really? What is so difficult to generate tab separated text
> data?;)
> >>> >> > I think we shouldn't do this, but there seems to be very little
> >>> interest
> >>> >> in
> >>> >> > the community so I will not block your work on it.
> >>> >> >
> >>> >> > Good luck ;)
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message