hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Menno Luiten" <mlui...@artifix.net>
Subject Hbase inverted index partitioning
Date Fri, 08 May 2009 12:25:07 GMT
Hi everyone,

I'm working on a project in which we need a distributed inverted index, and
are getting some fair results using HBase and Hadoop (Crawlers -> Document
Repository (HBase) --M/R-> Document Index (Hbase) --M/R-> Inverted Index).
However, we are also investigating more efficient methods to use this
inverted index. So after reading [1] we are wondering if anyone figured a
way to let a HBase cluster do document-based partitioning instead of
term-based partitioning. 

Basically the question boils down to: is there a easy way to distribute
columns over multiple regions and let a client/HBase scan over multiple
regions to gather a row and its columns? And if no, are there people using
HBase for (search system) inverted indexes anyway and how is it coping?


Menno Luiten

[1] B. Cambazoglu, et al. "Effects of Inverted Index Partitioning Schemes on
Performance of Query Processing in Parallel Text Retrieval Systems"

View raw message