hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: hbase + nutch
Date Fri, 09 Nov 2007 19:28:30 GMT
Sebastien Rainville wrote:
> Did anybody tried to use both Nutch and HBase together yet?
> Basically I need to store structured information extracted from the web
> pages. Saving that data in a database like mysql would be a temporary
> option but in the long term, the amount of information will grow fast
> and I'll need a more scalable system. That's where HBase comes into
> play. The next logical move would then be to modify nutch to save the
> pages in HBase. The system would then be very flexible. Is it what you
> guys have in mind for the future of Nutch?

In short - yes. However, at the moment HBase still seems too unstable to 
integrate it into Nutch. So basically we (at least me and Dennis) are 
playing with it to get the feel of what's possible.

> But for now, Nutch is not integrated with HBase... I can still write
> Nutch extensions that save the structured data that I need into HBase.
> Is there a way to make them interact smoothly? The first obvious problem
> that I have is that both of them are built on a different version of
> Hadoop. Is there's a good way of doing it?

Good news: Nutch trunk has been updated to Hadoop 0.15. The first 
official release of HBase also runs on Hadoop 0.15.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message