hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: IHBase indexes persistence
Date Sat, 20 Mar 2010 22:23:41 GMT
Hey guys,

I hate to ruin it for you, but Google search does not use bigtable at
the query time.  If you would like an example of a good robust search
and indexing system, you could have a look at lucene library, the solr
system build on lucene, and katta which is another system building on


On Sat, Mar 20, 2010 at 3:13 PM, TuX RaceR <tuxracer69@gmail.com> wrote:
> Hello Hbase user List!
> The feature provided by IHbase is very appealing. It seems to correspond to
> a use case very common in applications (at least in mine ;) )
> Dan Washusen wrote:
>> Not at the moment.  It currently keeps a copy of each unique indexed
>> value and each row key in memory...
> Is there a more robust indexing on the roadmap?
> HBase if I understand well proposes an opensource version of Google
> Bigtable.
> To me the most striking difference between Hbase and Bigtable is for
> narrowing searches; the example below shows what I mean by narrowing:
> If in Google you search for the word
> hbase:
> (i.e using:
> http://www.google.com/search?q=hbase
> )
> you get a fast answer
> (typically: Results *1* - *10* of about *249,000* for *hbase*. (*0.17*
> seconds))
> Now if you search all pages coming for the hadoop.apache.org host name (or
> base URL), that is with the query:
> hbase +site:hadoop.apache.org
> (i.e using the URL:
> http://www.google.com/search?q=hbase+%2Bsite%3Ahadoop.apache.org
> )
> you get a pretty fast answer to:
> (typically: Results *1* - *10* of about *2,510* from *hadoop.apache.org* for
> *hbase*. (*0.12* seconds) )
> It seems to me that the second search uses a secondary index on a column
> named 'site' to scan the 'hbase' based keys. Obviously Google found a good
> way to implement this (good= fast and scalable)
> Is this Google second indexing documented somewhere? Is that implemented
> using something like IHbase or more something like THbase, or something
> else?
> Also, why IHbase stays in the 'contrib' tree? Is that because the code is
> not at the same level as the main hbase code (not as tested, not as robust,
> etc...)?
> Thanks
> TuX

View raw message