hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Herou <marcus.he...@tailsweep.com>
Subject Limit the number of open files in MultipleTextOutputFormat
Date Fri, 10 Jul 2009 08:16:37 GMT

Locally I had issues with Hadoop throwing the "Too many open files"
exception so I fixed that by allowing more open files with ulimit. I set
that one to 65535 and I had about 8000 keys so that was a no-brainer.

However I am sure that we have more keys than that in our production data so
I guess hadoop will throw the "Too many open files" exception then.

Is there any possibility to only have X streams open at the same time ?
I guess it is due to open/close stream efficiency that all streams are held
open but I think that one can be tweaked to be more flexible.

When I wrote this myself (non-hadoop) I used a LruCache to hold the
outputstreams and got notified whenever the stream was evicted from the
cache so I could close it.

Input ? Perhaps point me in the right direction and I can submit a "patch"
writing this myself.



Marcus Herou CTO and co-founder Tailsweep AB

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message