Hi All,
  Anybody facing this issue?
  In our observation this issue came in long run with huge no of blocks in Data Nodes . every hour Data Nodes are sending their blocks report to the Name Node. If number of blocks in Data Node are huge (3 Data Nodes with 2GB RAM, Scribe server is sending logs at 5000records/s , 4 scribe clients , block size is 64MB ) then it requires good amount of time to scan all the blocks. This block scanning causes lot of IO operations. At this time if any write request comes , then it will take long time for it to get a free io channel on the Data Node. Because of this during the blcock scan time a Data Node may not be able to acknowledge the client requests causing timeouts on the client sockets.
If DN1 send the data to DN2 for replication and at that time DN2 is doing the block scanning. Since DN2 is busy, it may not be able to send the ack to DN1 on time. So here timeouts can happen.
Uma Maheswara Rao .G
Software Engineer
Address: Huawei Industrial Base
Bantian Longgang
Shenzhen 518129, P.R.China
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!