hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: Using external indexes in an HBase Map/Reduce job...
Date Tue, 12 Oct 2010 12:54:00 GMT
Hi Michael Segel.

If I understand your question correctrly, you looking for optimal way
for scanning
index search results? If not, my answer below is not relevant :).

1. For mr joins or large index results scan bloom filters can be used
like described here
http://blog.rapleaf.com/dev/2009/09/25/batch-querying-with-cascading/

2. Another option: denormalize data in same or separate table.
(depends on nature of object relations).

3. Random gets. For each row from solr issue random get. (for really
small result sets or paging).

4. Put compacted data (latest data, small subset of data etc) into solr index.


2010/10/12 Michael Segel <michael_segel@hotmail.com>:
>
> Hi,
>
> Now I realize that most everyone is sitting in NY, while some of us can't leave our respective
cities....
>
> Came across this problem and I was wondering how others solved it.
>
> Suppose you have a really large table with 1 billion rows of data.
> Since HBase really doesn't have any indexes built in (Don't get me started about the
contrib/transactional stuff...), you're forced to use some sort of external index, or roll
your own index table.
>
> The net result is that you end up with a list object that contains your result set.
>
> So the question is... what's the best way to feed the list object in?
>
> One option I thought about is writing the object to a file and then using it as the file
in and then control the splitters. Not the most efficient but it would work.
>
> Was trying to find a more 'elegant' solution and I'm sure that anyone using SOLR or LUCENE
or whatever... had come across this problem too.
>
> Any suggestions?
>
> Thx
>
>

Mime
View raw message