hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon" <yar...@yahoo-inc.com>
Subject data loss using hadoop
Date Mon, 20 Mar 2006 21:33:45 GMT
While playing around with a hadoop dfs cluster, we've observed data loss.
This may be related to our having stopped and restarted the DFS a couple of
times, possibly with nodes not all going online and offline at just the
right timing, but many of our files, ranging in size from less than 1GB to
multi GB each, have at least one block missing. Blocks are missing from
relatively new files, generated within the last two weeks, the file system
was never more than 25% full, and there's no outstanding reason why this
loss should have happened.
In the upcoming days/weeks we'll be looking into the reasons, and for ways
of making the DFS more robust against this kind of loss.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message