apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raja.Aravapalli <Raja.Aravapa...@target.com>
Subject Re: [EXTERNAL] Re: hdfs file write operator is increasing the latency - resulting entire DAG to fail
Date Thu, 13 Jul 2017 16:07:18 GMT

Thanks for the response Pramod.


-          My hdfs operator is running in single partition. With the input of approx. 1000
msgs per sec. – I am not sure how to partition this operator ☹

-          I am not really sure on how to check the bytes/sec. But, I hope It will be huge,
because my msg size in kafka is approx. 2kb.   ===> input 1000 msgs per sec * 2kb == approx..
2mb per sec [Rough calculation]

-          And for your info, right now, using the below property I have the set the memory
for this operator to 20Gb. Which I feel is very huge.
<property>
    <name>dt.operator.HDFS_operator.attr.MEMORY_MB</name>
    <value>20480</value>
</property>


Please advice.


Thanks a lot.

Raja.

From: Pramod Immaneni <pramod@datatorrent.com>
Reply-To: "users@apex.apache.org" <users@apex.apache.org>
Date: Thursday, July 13, 2017 at 10:31 AM
To: "users@apex.apache.org" <users@apex.apache.org>
Subject: [EXTERNAL] Re: hdfs file write operator is increasing the latency - resulting entire
DAG to fail

Hi Raja,

How many partitions do you have for the file output operator and what would you save your
data write rate is in bytes/second.

Thanks

On Thu, Jul 13, 2017 at 8:13 AM, Raja.Aravapalli <Raja.Aravapalli@target.com<mailto:Raja.Aravapalli@target.com>>
wrote:
Team,

We have an apex application that is reading from Kafka and wring to HDFS.

The  data flow for kafka topic is very huge… say 2500 messages per sec!!

The issue we are facing is:

The operator (which extends AbstractFileOutputOperator) is writing to hdfs is building latency
over time and failing eventually. Can someone pls share your thoughts on how I can handle
this ?


Thanks a lot.


Regards,
Raja.

Mime
View raw message