apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Removing FSFileSplitter from Malhar library
Date Fri, 06 May 2016 23:44:33 GMT
Just saw that there is *HDFSFileSplitter* in the library as well.
This sets *ignoreFilePatternRegularExp *to ".*._COPYING_"  and
*unsupportedChar* to ":",

IMO this class should be removed as well.


On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <singh.chandni@gmail.com>

> Hi,
> Recently there was FSFileSplitter added to Malhar library.
> I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081 to
> remove this operator and adds its functionality to the FileSplitterInput.
> The reason to do so is because this extension just adds 3 trivial features
> which makes it difficult for the user to know which operator to use. It
> adds more classes which essentially do the same thing.
> This operator add 3 properties to FileSplitterInput.
> 1. ignoreFilePatternRegularExp: regular expression that specifies which
> files to ignore.
> This is useful to have in the FileSplitterInput.
> 2. unsupportedChar: first of all this is a String. File having this String
> will be ignored.
> IMO this is redundant. #1 can be used to accomplish this.
> I think this should be removed.
> 3. sequentialFileReader: when this property is set, the block metadata of
> the same files have the same hashcode. This I think may have been done so
> that all the block metadata of a particular file go to the same block
> reader.
> IMO this is a  hacky way of accomplishing this. If an application needs
> this then this should have been done using a StreamCodec.
> I think this should be removed.
> Thanks,
> Chandni

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message