apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <priya...@datatorrent.com>
Subject Re: [malhar-users] Re: How to find EOF - FileSplitter
Date Wed, 14 Oct 2015 16:49:50 GMT

The partition count depends on following factors:
1. How big is your input data
2, Your cluster size
3. Desired speed

If you choose partition count too high and you have only couple of files to
process the partitions (i.e. containers) will live idle after processing
input files. And if it's too small your processing will be slow.
If you can predict your input traffic in advance, you can decide on
partition count in advance.

And as input keeps varying,  we have dynamic partitioning. Where partitions
will increase or decrease based on input volume. Once the input is
processed the partitions will be removed and dag will shrink till more data
is available for processing.

FileSplitter just splits the file metadata, the BlockReader actually reads
the blocks. And in your application BlockReader will have partitions as it
does the work of reading data.


On Wed, Oct 14, 2015 at 6:55 PM, Chiru <chiru.vcj@gmail.com> wrote:

> What are all parameters need to cosider to set the partitin count? Like
> can we give any random number or bases on cluster size or file size or
> block size?
> please brief on partition count setting? how it process the file/block?
> Thanks-chiru
> <property>
> <name>dt.application.<appName>.operator.<operatorName>.attr.PARTITIONER</name>
>   <value>com.datatorrent.common.partitioner.StatelessPartitioner:1</value>
> </property>
> On Friday, 9 October 2015 18:33:06 UTC+5:30, Chiru wrote:
>> Hi All,
>> How i can find the entire file read when using the FileSplitter.I have to
>> wait till the EOF then start processing.
>> Please share sample code if possible.
>> Thanks -Chiru
> --
> You received this message because you are subscribed to the Google Groups
> "Malhar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to malhar-users+unsubscribe@googlegroups.com.
> To post to this group, send email to malhar-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/malhar-users.
> For more options, visit https://groups.google.com/d/optout.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message