hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Sripati <praveensrip...@gmail.com>
Subject Re: InputFormats for Hama
Date Wed, 28 Mar 2012 01:33:10 GMT
Ed,

After I have done porting Hadoop formats to Hama, I can work on it.

I have created a sub-task HAMA-544 for HBase InputFormat.

Praveen

On Wed, Mar 28, 2012 at 4:33 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> Nice discussion!
>
> BTW, Anyone interested in contributing HBase table input/output formatters?
>
> On Mon, Mar 26, 2012 at 2:27 AM, Thomas Jungblut
> <thomas.jungblut@googlemail.com> wrote:
> > Thanks for your time.
> > I have tweeted about the graph db formats, I know some of my followers
> are
> > working with them, so they might be interested.
> >
> > Am 25. März 2012 19:25 schrieb Praveen Sripati <praveensripati@gmail.com
> >:
> >
> >> I have created Umbrella JIRA HAMA-536 for creating the
> >> InputFormats/OutputFormats with three sub-tasks. For now I have assigned
> >> the tasks to me, let me know if anyone is interested.
> >>
> >> Praveen
> >>
> >> On Sun, Mar 25, 2012 at 6:40 PM, Thomas Jungblut <
> >> thomas.jungblut@googlemail.com> wrote:
> >>
> >> > >
> >> > > I can open a JIRA. I need input on what all InputFormat makes sense
> and
> >> > the
> >> > > their priority. Some we can port from Hadoop.
> >> >
> >> >
> >> > Yep, you're right. I guess a single JIRA would be enough for the
> already
> >> > implemented formats in Hadoop, for the others we need subclasses.
> >> > Formats that I really wanted to have would be:
> >> >
> >> >   - DBInputFormat[1]
> >> >   - XMLInputFormat
> >> >   - NLineInputFormat
> >> >   - CSVInputFormat (we could use OpenCSV for that in conjunction with
> >> >   TextInputFormat)
> >> >   - JSONInputFormat (for OpenGraph stuff)
> >> >   - The graph DB formats Neo4J and how the others are called
> >> >
> >> > Anything I missed for a "full" coverage?
> >> >
> >> > Could you please elaborate on this?
> >> >
> >> >
> >> > Sure, DMOZ is some kind of crawled website database. It is used in
> some
> >> > pagerank examples to test it, don't know if it was in Mahout. We could
> >> also
> >> > use it since we have pagerank as well.
> >> > CommonCrawl is a new up-coming DMOZ-like database of many crawled
> sites,
> >> it
> >> > is hosted on S3 in Amazon Cloud. We run on EC2 via Whirr so this could
> >> be a
> >> > cool example as well.
> >> >
> >> > [1]
> >> >
> >> >
> >>
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.html
> >> >
> >> >
> >> > Am 25. März 2012 14:56 schrieb Praveen Sripati <
> praveensripati@gmail.com
> >> >:
> >> >
> >> > > Thomas et al,
> >> > >
> >> > > > Would someone please open JIRAs for that?
> >> > >
> >> > > I can open a JIRA. I need input on what all InputFormat makes sense
> and
> >> > the
> >> > > their priority. Some we can port from Hadoop.
> >> > >
> >> > > > Based on XML we can implement a format that parses DMOZ or
> >> commoncrawl
> >> > on
> >> > > Amzon S3.
> >> > >
> >> > > Could you please elaborate on this?
> >> > >
> >> > > Praveen
> >> > >
> >> > >
> >> > > On Sun, Mar 25, 2012 at 5:14 PM, Chia-Hung Lin <
> clin4j@googlemail.com
> >> > > >wrote:
> >> > >
> >> > > > As I understand, many iterative applications don't require key
> value
> >> > > > input/ output and additionally need random access (read/ write)
to
> >> > > > particular file. I/O interface e.g. mpi may increase flexibility
> >> here.
> >> > > >
> >> > > > https://issues.apache.org/jira/browse/MAPREDUCE-2911
> >> > > >
> >> > > > On 25 March 2012 10:01, Praveen Sripati <praveensripati@gmail.com
> >
> >> > > wrote:
> >> > > > > Hi,
> >> > > > >
> >> > > > > For Hama there are limited input formats
> >> > > > >
> >> > > > > CombineFileInputFormat, FileInputFormat, NullInputFormat,
> >> > > > > SequenceFileInputFormat, TextInputFormat
> >> > > > >
> >> > > > > Does it make sense to have to have more input formats? I
was
> >> thinking
> >> > > > > InputFormats for Graph Databases.
> >> > > > >
> >> > > > > Any feedback for the different input formats is welcome.
> >> > > > >
> >> > > > > I quickly glanced Giraph and Hadoop and they have more
> InputFormats
> >> > > which
> >> > > > > makes it easy to plug them with external systems.
> >> > > > >
> >> > > > > Praveen
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thomas Jungblut
> >> > Berlin <thomas.jungblut@gmail.com>
> >> >
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin <thomas.jungblut@gmail.com>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message