mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: TU Berlin Winter of Code Project
Date Fri, 06 Nov 2009 19:57:27 GMT
The question that I don't see addressed is whether you choose to use a fully
streaming approach as is done in Bixo or whether you will use a document
repository approach as is more common in most search engines.

Hbase is reputedly ready enough to serve as a document repository.  Using
such an approach would be very helpful for the incremental nature of web
crawls.

What is the plan in this regard?

On Fri, Nov 6, 2009 at 11:47 AM, Grant Ingersoll <gsingers@apache.org>wrote:

>
> This is obviously only a first draft of what we think would be a suited
> overall
> architecture




-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message