hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@yahoo-inc.com>
Subject Re: data in yahoo / facebook hdfs
Date Mon, 15 Jun 2009 19:03:00 GMT

On 6/13/09 9:00 AM, "PORTO aLET" <portoalet@gmail.com> wrote:
> I am just wondering what do facebook/yahoo do with the data in hdfs after
> they finish processing the log files or whatever that are in hdfs?
> Are they simply deleted? or get backed up in tape ?
> whats the typical process?

    The grid ops team here at Yahoo! has a strict retention policy that
dictates the data is deleted after X time period.  We perform no backups of
the data on the grid.  It is also worth mentioning that the data is loaded
from the primary source, so in the case of data corruption (hai hadoop-0.18)
or accidental deletion (where are my snapshots dev people?), we reload the
data from that primary source. (dependent, of course, on whether they still
have it or not)

> Also what is the process of adding a new node to the hadoop cluster? simply
> connect a new computer to the network (and setup the hadoop conf)?


View raw message