hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Using Hadoop for Record storage
Date Thu, 12 Apr 2007 17:05:52 GMT
Andy Liu wrote:
> I'm exploring the possibility of using the Hadoop records framework to 
> store
> these document records on disk.  Here are my questions:
> 
> 1. Is this a good application of the Hadoop records framework, keeping in
> mind that my goals are speed and scalability?  I'm assuming the answer is
> yes, especially considering Nutch uses the same approach

For read-only access, performance should be decent.  However Hadoop's 
file structures do not permit incremental updates.  Rather they are 
primarily designed for batch operations, like MapReduce outputs.  If you 
need to incrementally update your data, then you might look at something 
like BDB, a relational DB, or perhaps experiment with HBase.  (HBase is 
designed to be a much more scalable, incrementally updateable DB than 
BDB or relational DBs, but its implementation is not yet complete.)

Doug

Mime
View raw message