apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devendra Tagare <devend...@datatorrent.com>
Subject Re: Naming sugestion for HDFS output modules
Date Mon, 28 Mar 2016 17:24:35 GMT
Is the plan to align the tuple writer with the org.apache.hadoop.mapred output
formats ?

https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/mapred/OutputFormat.html

The advantage of this would be that Apex can be used for ETL's to write
mapreduce compatible output files which can be used by downstream jobs
(jobs external to Apex) for further processing.

HDFSTupleWriter/FileWriter seems like a good choice if the writes are not
extending the mapred output formats.

If its a simple file write then one of the subclasses of the below hadoop
output format can be used -
https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/mapred/FileOutputFormat.html
can still be used.

Thanks,
Dev

On Mon, Mar 28, 2016 at 10:00 AM, Chinmay Kolhatkar <chinmay@apache.org>
wrote:

> Sorry for delay in reply... Still catching up with emails..
>
> I'm not sure whether we should have "Module" in the names.
>
> How about HDFSTupleWriter and HDFSFileWriter?
>
> Thanks,
> Chinmay.
>
> ---
> Sent from mobile.
> On 23 Mar 2016 4:49 p.m., "Yogi Devendra" <yogidevendra@apache.org> wrote:
>
> > Hi,
> >
> > Currently, I am in the process of developing HDFS output module:
> > We have two modules for HDFS output.
> > 1. Tuple based []
> > 2. File based (used for file copy)
> >
> > Currently, I am calling #1 as "HDFS output module" as this module is the
> > one which will be mostly used to write tuples to HDFS.
> >
> > I am calling #2 as "HDFS file copy module"; because it is mainly used
> only
> > for file copy operations.
> >
> > Any suggestions for alternate names for these modules?
> > From the names we want to stress the following
> >
> >    - #2 to be used only for file copy operations (block by block copy)
> >    - #1 to be used for tuple by tuple write to HDFS
> >    - Both #1 and #2 are HDFS output modules.
> >
> > Actually, we thought of combining them into single module. But, problem
> is
> > port signatures for both the modules is different. Thus, combing them
> will
> > result in different ports based on configuration.
> > It would be confusing for the app developers to decide which ports
> should I
> > connect to if ports are changing based on the configuration.
> >
> > Question:
> >
> > 1. Name suggestion for #1?
> > a. HDFS output module b. HDFSTuplesWriteModule c.
> HDFSMsgBasedOutputModule
> > d. other (please specify)
> >
> > 2. Name suggestion for #2?
> > a. HDFS file copy module b. HDFSBlocksWriteModule c.
> > HDFSBlockBasedOutputModule
> > d.  HDFSFileCopyOutputModule e. other (please specify)
> >
> > ~ Yogi
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message