hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Guo <guosi...@gmail.com>
Subject Re: Distributed indexing
Date Mon, 28 Apr 2008 15:05:55 GMT
map/reduce will be a suitable approach for indexing large doc 
collections. but I don't know is it suitable for retrieval. you can see 
*Nutch* for the distributed searching.

under the hadoop/contrib directory , there is a *Index* package. It may 
be helpful :)

Matt Wood 写道:
> Hello all,
>
> I was wondering if someone in the know could tell me about the current 
> state of play with building and searching large indices with hadoop?
>
> Some background: I work on the human genome project, and we're 
> currently setting up a new facility based around the next generation 
> of DNA sequencing. We're currently producing around 50Tb of data a 
> week, some of which we would like to provide fast access to via an index.
>
> Having read up on hadoop, it appears that it could play a central part 
> in our infrastructure, and that others have tried (and succeeded) in 
> building a distributed indexing and retrieval system with hadoop. I'd 
> be interested if anyone could point me in the right direction to more 
> information or examples of such a system. Yahoo! (with webmap) seems 
> to be close to the sort of thing we would need.
>
> Would map/reduce be a suitable approach for indexing _and_ retrieval, 
> or just indexing? Would Solr/Lucene be a good fit? Any help or 
> pointers to more information would be much appreciated!
>
> If you would like any more details, I'd be more than happy to supply 
> them!
>
> Many thanks,
>
> ~ Matt
>
>
> -------------
>
> Matt Wood
> Sequencing Informatics // Production Software
> www.sanger.ac.uk
>
>
>


Mime
View raw message