hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Fiedler" <nathanfied...@gmail.com>
Subject Re: Using HDFS as native storage
Date Thu, 27 Mar 2008 16:53:45 GMT
I can't offer any insights into other clustering FS solutions, but I
think it's a very safe bet to say that Google relies entirely on GFS
for their long-term storage. Granted, they almost certainly make
offline backups of business-critical data, but I would assume that
everything related to GMail, Google Code, Picasa, Google Docs, etc. is
stored in, and only in, one or more massive GFS clusters. Take for
instance their pride in the fact that they (claim) to have lost only
one 64MB block in the history of their modern infrastructure (that is,
since 2004).

Look at it another way: how would you backup petabytes of data? When
you've got multiple data centers consisting of thousands of nodes, and
every data block is replicated on at least three machines, what's the
point of backups?

Again, I'm no expert, I'm just basing this on everything I've read and
watched about Google. Hopefully others will have more enlightened


View raw message