Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
References: 
 <CAKnDEhjNuCs+p=fwriTJB8XWRGfz-gWdVnoxbm1pOj69Nwdrrw@mail.gmail.com>
 <CALte62wMFsvEcQwRbse7HiR4AUGFtFJ3ynvH2z7gpT=oD1py5A@mail.gmail.com>
 <CAHau4ysxA7rb9zPZnEAFa8=rdcu=E1G5aNNovnAPusQgH0nq6w@mail.gmail.com>
 <CAKnDEhjhcLi5MKsMgLYMdMcOig-1+Q6P7CCSm6i6GQe=kQCqPg@mail.gmail.com>
 <CAO83RbUbZQ0iooMUD1rj92+urnWHFkEOPU_1-NeUqECYJKZOFA@mail.gmail.com>
 <2CD9179D-41C8-4FAC-897D-B94E20D6AEE9@salesforce.com>
 <1330024921.56445.YahooMailNeo@web164501.mail.gq1.yahoo.com>
Message-ID: <1330025414.28483.YahooMailNeo@web164506.mail.gq1.yahoo.com>
Date: Thu, 23 Feb 2012 11:30:14 -0800 (PST)
From: Andrew Purtell <apurtell@apache.org>
Reply-To: Andrew Purtell <apurtell@apache.org>
Subject: Re: Solr & HBase - Re: How is Data Indexed in HBase?
To: "user@hbase.apache.org" <user@hbase.apache.org>
Cc: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
In-Reply-To: <1330024921.56445.YahooMailNeo@web164501.mail.gq1.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

To beat on this analogy further:=0A=0A"But it would be like using assembler=
 instead of Java or Ruby to build =0Athe server side of some website"=0A=0A=
... or if you are Facebook and you get really big but have a pile of PHP fo=
r a code base, you make HipHop to convert that code to assembler :-) (in ef=
fect)=0A=0AIn HBase land, someone hasn't had a scale itch for search big en=
ough to make our "HipHop". Or might that some day be Solbase?=0A=A0=0ABest =
regards,=0A=0A=0A=A0 =A0 - Andy=0A=0A=0AProblems worthy of attack prove the=
ir worth by hitting back. - Piet Hein (via Tom White)=0A=0A=0A=0A----- Orig=
inal Message -----=0A> From: Andrew Purtell <apurtell@apache.org>=0A> To: "=
user@hbase.apache.org" <user@hbase.apache.org>=0A> Cc: "hbase-user@hadoop.a=
pache.org" <hbase-user@hadoop.apache.org>=0A> Sent: Thursday, February 23, =
2012 11:22 AM=0A> Subject: Re: Solr & HBase - Re: How is Data Indexed in HB=
ase?=0A> =0A> I'd also make a comment on this:=0A> =0A>>  On Feb 22, 2012, =
at 12:12 PM, Jacques wrote:=0A> =0A>>  The key to keyword retrieval is the =
construction of the data.=A0 Among other=0A>>  things, this is one of the k=
ey things that Solr is very good at: creating a=0A>>  very efficient organi=
zation of the data so that you can retrieve quickly.=0A>>  At their core, S=
olr, ElasticSearch, Lily and Katta all use Lucene to=0A>>  construct this d=
ata.=A0 HBase is bad at this.=0A> =0A> I can build an inverted index on top=
 of HBase for some form of full text search. =0A> But it would be like usin=
g assembler instead of Java or Ruby to build the server =0A> side of some w=
ebsite. Unless scale forces hyper-optimization for the use case, =0A> ES or=
 Solr is a better choice because then one doesn't have to do all of the =0A=
> heavy lifting.=0A> =0A> Also, it doesn't have to be an either-or choice. =
Projects like Lily and =0A> Solbase are interesting hybrids.=0A> =0A> =0A> =
Best regards,=0A> =0A> =0A> =A0 =A0 - Andy=0A> =0A> Problems worthy of atta=
ck prove their worth by hitting back. - Piet Hein (via =0A> Tom White)=0A> =
=0A> =0A> =0A> ----- Original Message -----=0A>>  From: Ian Varley <ivarley=
@salesforce.com>=0A>>  To: "user@hbase.apache.org" <user@hbase.apache.org>=
=0A>>  Cc: "hbase-user@hadoop.apache.org" =0A> <hbase-user@hadoop.apache.or=
g>=0A>>  Sent: Wednesday, February 22, 2012 10:18 AM=0A>>  Subject: Re: Sol=
r & HBase - Re: How is Data Indexed in HBase?=0A>> =0A>>  One minor clarifi=
cation:=0A>> =0A>>  HBase is primarily built for retrieving a single row at=
 a time based on a=0A>>  predetermined and known location (the key).=0A>> =
=0A>>  Substitute that with: "HBase is primarily built for retrieving sets =
of =0A> =0A>>  contiguous sorted rows based on a predetermined and known lo=
cation (the =0A> start =0A>>  key)". Scans are fundamentally just as effici=
ent in HBase as gets, =0A> because =0A>>  row keys are sorted. In fact, Get=
 is just implemented as a 1-row Scan!=0A>> =0A>>  This is one of the nice d=
esign features that sets HBase (and similar =0A> stores) =0A>>  apart from =
straight key/value stores; you can do range scans of rows.=0A>> =0A>>  Ian=
=0A>> =0A>>  On Feb 22, 2012, at 12:12 PM, Jacques wrote:=0A>> =0A>>  Solr =
does not provide a complex enough support to rank.=0A>>  I believe Solr has=
 a bunch of plug-ability to write your own custom ranking=0A>>  approach.=
=A0 If you think you can't do your desired ranking with Solr, =0A>>  you're=
=0A>>  probably wrong and need to ask for help from the Solr community.=0A>=
> =0A>>  retrieving data by keyword is one of them. I think Solr is a prope=
r=0A>>  choice=0A>>  The key to keyword retrieval is the construction of th=
e data.=A0 Among other=0A>>  things, this is one of the key things that Sol=
r is very good at: creating a=0A>>  very efficient organization of the data=
 so that you can retrieve quickly.=0A>>  At their core, Solr, ElasticSearch=
, Lily and Katta all use Lucene to=0A>>  construct this data.=A0 HBase is b=
ad at this.=0A>> =0A>>  how HBase support high performance when it needs to=
 keep consistency in=0A>>  a large scale distributed system=0A>>  HBase is =
primarily built for retrieving a single row at a time based on a=0A>>  pred=
etermined and known location (the key).=A0 It is also very efficient at=0A>=
>  splitting massive datasets across multiple machines and allowing sequent=
ial=0A>>  batch analyses of these datasets.=A0 HBase can maintain high perf=
ormance in=0A>>  this way because consistency only ever exists at the row l=
evel.=A0 This is=0A>>  what HBase is good at.=0A>> =0A>>  You need to focus=
 what you're doing and then write it out.=A0 Figure out =0A> how=0A>>  you =
think the pieces should work together.=A0 Read the documentation.=A0 Then,=
=0A>>  ask specific questions where you feel like the documentation is uncl=
ear or=0A>>  you feel confused.=A0 Your general questions are very difficul=
t to answer in=0A>>  any kind of really helpful way.=0A>> =0A>>  thanks,=0A=
>>  Jacques=0A>> =0A>> =0A>>  On Wed, Feb 22, 2012 at 9:51 AM, Bing Li =0A>=
>  <lblabs@gmail.com<mailto:lblabs@gmail.com>> wrote:=0A>> =0A>>  Mr Gupta,=
=0A>> =0A>>  Thanks so much for your reply!=0A>> =0A>>  In my use cases, re=
trieving data by keyword is one of them. I think Solr=0A>>  is a proper cho=
ice.=0A>> =0A>>  However, Solr does not provide a complex enough support to=
 rank. And,=0A>>  frequent updating is also not suitable in Solr. So it is =
difficult to=0A>>  retrieve data randomly based on the values other than ke=
yword frequency in=0A>>  text. In this case, I attempt to use HBase.=0A>> =
=0A>>  But I don't know how HBase support high performance when it needs to=
 =0A> keep=0A>>  consistency in a large scale distributed system.=0A>> =0A>=
>  Now both of them are used in my system.=0A>> =0A>>  I will check out Ela=
sticSearch.=0A>> =0A>>  Best regards,=0A>>  Bing=0A>> =0A>> =0A>>  On Thu, =
Feb 23, 2012 at 1:35 AM, T Vinod Gupta =0A>>  <tvinod@readypulse.com<mailto=
:tvinod@readypulse.com>>wrote:=0A>> =0A>>  Bing,=0A>>  Its a classic battle=
 on whether to use solr or hbase or a combination of=0A>>  both. both syste=
ms are very different but there is some overlap in the=0A>>  utility. they =
also differ vastly when it compares to computation power,=0A>>  storage nee=
ds, etc. so in the end, it all boils down to your use case. you=0A>>  need =
to pick the technology that it best suited to your needs.=0A>>  im still no=
t clear on your use case though.=0A>> =0A>>  btw, if you haven't started us=
ing solr yet - then you might want to=0A>>  checkout ElasticSearch. I spent=
 over a week researching between solr and ES=0A>>  and eventually chose ES =
due to its cool merits.=0A>> =0A>>  thanks=0A>> =0A>> =0A>>  On Wed, Feb 22=
, 2012 at 9:31 AM, Ted Yu =0A>>  <yuzhihong@gmail.com<mailto:yuzhihong@gmai=
l.com>> wrote:=0A>> =0A>>  There is no secondary index support in HBase at =
the moment.=0A>> =0A>>  It's on our road map.=0A>> =0A>>  FYI=0A>> =0A>>  O=
n Wed, Feb 22, 2012 at 9:28 AM, Bing Li =0A>>  <lblabs@gmail.com<mailto:lbl=
abs@gmail.com>> wrote:=0A>> =0A>>  Jacques,=0A>> =0A>>  Yes. But I still ha=
ve questions about that.=0A>> =0A>>  In my system, when users search with a=
 keyword arbitrarily, the query=0A>>  is=0A>>  forwarded to Solr. No any up=
dating operations but appending new indexes=0A>>  exist in Solr managed dat=
a.=0A>> =0A>>  When I need to retrieve data based on ranking values, HBase =
is used.=0A>>  And,=0A>>  the ranking values need to be updated all the tim=
e.=0A>> =0A>>  Is that correct?=0A>> =0A>>  My question is that the perform=
ance must be low if keeping consistency=0A>>  in a=0A>>  large scale distri=
buted environment. How does HBase handle this issue?=0A>> =0A>>  Thanks so =
much!=0A>> =0A>>  Bing=0A>> =0A>> =0A>>  On Thu, Feb 23, 2012 at 1:17 AM, J=
acques =0A>>  <whshub@gmail.com<mailto:whshub@gmail.com>> wrote:=0A>> =0A>>=
  It is highly unlikely that you could replace Solr with HBase.=0A>>  They'=
re=0A>>  really apples and oranges.=0A>> =0A>> =0A>>  On Wed, Feb 22, 2012 =
at 1:09 AM, Bing Li =0A>>  <lblabs@gmail.com<mailto:lblabs@gmail.com>> wrot=
e:=0A>> =0A>>  Dear all,=0A>> =0A>>  I wonder how data in HBase is indexed?=
 Now Solr is used in my system=0A>>  because data is managed in inverted in=
dex. Such an index is=0A>>  suitable to=0A>>  retrieve unstructured and hug=
e amount of data. How does HBase deal=0A>>  with=0A>>  the=0A>>  issue? May=
 I replaced Solr with HBase?=0A>> =0A>>  Thanks so much!=0A>> =0A>>  Best r=
egards,=0A>>  Bing=0A>> =0A>