hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Using external indexes in an HBase Map/Reduce job...
Date Tue, 12 Oct 2010 16:13:37 GMT

Thanks for the reply...

That's not exactly what I'm looking for...

Suppose you have an exterior system which provides you the list of row keys you want. 
What ever that system is.

So you have a java list object and you want to do a M/R based on input from a Java List.

What's the best way to do it?


> From: octo47@gmail.com
> Date: Tue, 12 Oct 2010 16:54:00 +0400
> Subject: Re: Using external indexes in an HBase Map/Reduce job...
> To: user@hbase.apache.org
> 
> Hi Michael Segel.
> 
> If I understand your question correctrly, you looking for optimal way
> for scanning
> index search results? If not, my answer below is not relevant :).
> 
> 1. For mr joins or large index results scan bloom filters can be used
> like described here
> http://blog.rapleaf.com/dev/2009/09/25/batch-querying-with-cascading/
> 
> 2. Another option: denormalize data in same or separate table.
> (depends on nature of object relations).
> 
> 3. Random gets. For each row from solr issue random get. (for really
> small result sets or paging).
> 
> 4. Put compacted data (latest data, small subset of data etc) into solr index.
> 
> 
> 2010/10/12 Michael Segel <michael_segel@hotmail.com>:
> >
> > Hi,
> >
> > Now I realize that most everyone is sitting in NY, while some of us can't leave
our respective cities....
> >
> > Came across this problem and I was wondering how others solved it.
> >
> > Suppose you have a really large table with 1 billion rows of data.
> > Since HBase really doesn't have any indexes built in (Don't get me started about
the contrib/transactional stuff...), you're forced to use some sort of external index, or
roll your own index table.
> >
> > The net result is that you end up with a list object that contains your result set.
> >
> > So the question is... what's the best way to feed the list object in?
> >
> > One option I thought about is writing the object to a file and then using it as
the file in and then control the splitters. Not the most efficient but it would work.
> >
> > Was trying to find a more 'elegant' solution and I'm sure that anyone using SOLR
or LUCENE or whatever... had come across this problem too.
> >
> > Any suggestions?
> >
> > Thx
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message