lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Eugen Stan <stan.ieu...@gmail.com>
Subject storing lucene index in hbase (Hbase Directory implementation) as GSoC
Date Wed, 14 Mar 2012 07:25:59 GMT
Hi,

I know there are a lot of attempts to make lucene searches distributed
but I haven't seen one that tries to implement a lucene Directory in
HBase/ Hadoop, except one discussion in this article[1]. I've worked
with HBase and I believe this is a good approach to combine the two.

The thing with this concept is that you could very easily build a
distributed search by running multiple search slaves that could each
search a part of the index and then aggregate the results. If you dig
deep enough you could make those searches take advantage of data
locality (run searches on the node/region server that has your index
data) and then you really are in business.

Also, a HBase/Hadoop solution is also possible: store some data in
HBase and bigger parts directly in Hadoop inside a file structure to
overcome HDFS small file issues. This could allow HBase queries to
perform better but will complicate the design a bit.

I'm interested in hearing your opinions on this  and I also wish to
propose this as GSoC idea that I'm interested in implementing.

[1] http://www.infoq.com/articles/LuceneHbase

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message