hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hu <andrewh...@live.com>
Subject RE: Hbase - Solr Integration
Date Thu, 29 Sep 2011 22:37:10 GMT

Hi David,

I am currently working with HBase with 100 columns. My requirement is 
perform real time search on HBase using rowkeys, and these many columns (
 all within 1 family only in the schema). Typical query can be SQL type 
with AND OR NOT operators using these columns. I have ruled out batch processing, such as

Hive. My question is:

- HBase + Solr will probably give you 
better query speed, but you need to maintain the both clusters, pushing 
data from HBase to Solr, and perhaps update Solr index pretty frequently.
- Using HBase only and search needs to be 
against all of these columns, you need to either build secondary indexes
 for each of the column ( if master table is 1 million rows, you will 
end up with 100 millions row + 1 million of original master table,  
which will use quite a lot of space), but I suppose search can be done 
pretty fast as well ?

Not sure what is the best approach, any suggestions ?



> From: buttler1@llnl.gov
> To: user@hbase.apache.org
> Date: Thu, 29 Sep 2011 08:38:12 -0700
> Subject: RE: Hbase - Solr Integration
> It sounds like you should investigate the Lily Project.  They have already done a lot
of work to integrate Solr and HBase into a single solution.  I did something similar before
they released their project -- I like my use of dynamic schema's, but their overall approach
is probably more solid.  In particular they have given careful consideration as to what to
do with large objects, and how to integrate them into the system.  And most importantly, their
project is open.
> There was also some talk earlier of integrating HBase and Solr -- you might want to search
the list for some of Jason's posts.  I think that is a work in progress still.
> Otherwise you will have to roll your own solution.  It is actually not too difficult
to set up a system to publish HBase contents to Solr.  The difficulty is in maintaining a
consistent view of the data between the two.  I believe Lily uses queues to keep updates in
sync.  If you can tolerate some delay, you could simply update your indexes on a regular basis,
or set up your application to populate HBase and Solr simultaneously.  The biggest challenge
is resharding.  HBase will automatically split regions when they become too large.  Solr doesn't
have that capability yet, so you will have to manage the shards yourself.
> Another approach is to look at Elastic Search. That is a Lucene based system that does
do automatic sharding.
> Direct search on HBase requires either a clever key encoding (like OpenTSDB), and/or
multiple copies of the data to imitate secondary indexes.
> Dave
> -----Original Message-----
> From: Stuti Awasthi [mailto:stutiawasthi@hcl.com] 
> Sent: Thursday, September 29, 2011 2:52 AM
> To: user@hbase.apache.org
> Subject: Hbase - Solr Integration
> Hi Friends,
> I am storing my data in Hbase. I want to do search using Solr. I can't find much documentation
about the integration. Is there any documentation to integrate these two.
> Please Suggest
> Regards,
> Stuti Awasthi
> -----------------------------------------------------------------------------------------------------------------------
> The contents of this e-mail and any attachment(s) are confidential and intended for the
named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its affiliates. Any views
or opinions presented in
> this email are solely those of the author and may not necessarily reflect the opinions
of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification, distribution
and / or publication of
> this message without the prior written consent of the author of this e-mail is strictly
prohibited. If you have
> received this email in error please delete it and notify the sender immediately. Before
opening any mail and
> attachments please check them for viruses and defect.
> -----------------------------------------------------------------------------------------------------------------------
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message