Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Sat, 06 Aug 2011 18:54:01 -0000
Message-ID: <20110806185401.82426.49962@eos.apache.org>
Subject: 
 =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Hbase/FAQ=5FGeneral=22_by_DougMe?=
 =?utf-8?q?il?=
Auto-Submitted: auto-generated

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch=
ange notification.

The "Hbase/FAQ_General" page has been changed by DougMeil:
http://wiki.apache.org/hadoop/Hbase/FAQ_General?action=3Ddiff&rev1=3D1&rev2=
=3D2

- Describe Hbase/FAQ_General here.
+ FAQ - General Questions
  =

+ =3D=3D Questions =3D=3D
+  1. [[#1|When would I use HBase?]]
+  1. [[#2|Can someone give an example of basic API-usage going against hba=
se?]]
+  1. [[#3|What other hbase-like applications are there out there?]]
+  1. [[#8|How do I access HBase from my Ruby/Python/Perl/PHP/etc. applicat=
ion?]]on?]]
+  1. [[#14|Can HBase development be done on windows?]]
+  1. [[#15|Please explain HBase version numbering?]]
+  1. [[#16|What version of Hadoop do I need to run HBase?]]
+  1. [[#18|Are there any schema design examples?]]
+   =

+ =3D=3D Answers =3D=3D
+ =

+ =

+ '''1. <<Anchor(1)>> When would I use HBase?'''
+ =

+ See [[http://blog.rapleaf.com/dev/?p=3D26|Bryan Duxbury's post]] on this =
topic.
+ =

+ =

+ '''2. <<Anchor(2)>> Can someone give an example of basic API-usage going =
against hbase?'''
+ =

+ See the Data Model section in the HBase Book:  http://hbase.apache.org/bo=
ok.html#datamodel
+ =

+ See the [[Hbase|wiki home page]] for sample code accessing HBase from oth=
er than java.
+ =

+ '''3. <<Anchor(3)>> What other hbase-like applications are there out ther=
e?'''
+ =

+ Broadly speaking, there are many.  One place to start your search is here=
 [[http://blog.oskarsson.nu/2009/06/nosql-debrief.html|nosql]].
+ =

+ '''8. <<Anchor(8)>> How do I access Hbase from my Ruby/Python/Perl/PHP/et=
c. application?'''
+ =

+ See non-java access on [[Hbase|HBase wiki home page]]
+ =

+ =

+ '''14. <<Anchor(14)>> Can HBase development be done on windows?'''
+ =

+ See the the Getting Started section in the HBase Book:  http://hbase.apac=
he.org/book.html#getting_started
+ =

+ '''15. <<Anchor(15)>> Please explain HBase version numbering?'''
+ =

+ See [[http://wiki.apache.org/hadoop/Hbase/HBaseVersions|HBase Versions si=
nce 0.20.x]].  The below is left in place for the historians.
+ =

+ Originally HBase lived under src/contrib in Hadoop Core.  The HBase versi=
on was that of the hosting Hadoop.  The last HBase version that bundled und=
er contrib was part of Hadoop 0.16.1 (March of 2008).
+ =

+ The first HBase Hadoop subproject release was versioned 0.1.0.  Subsequen=
t releases went at least as far as 0.2.1 (September 2008).
+ =

+ In August of 2008, consensus had it that since HBase depends on a particu=
lar Hadoop Core version, the HBase major+minor versions would from now on m=
irror that of the Hadoop Core version HBase depends on.  The first HBase re=
lease to take on this new versioning regimine was 0.18.0 HBase; HBase 0.18.=
0 depends on Hadoop 0.18.x.
+ =

+ Sorry for any confusion caused.
+ =

+ '''16. <<Anchor(16)>> What version of Hadoop do I need to run HBase?'''
+ =

+ Different versions of HBase require different versions of Hadoop.  Consul=
t the table below to find which version of Hadoop you will need:
+ =

+ ||'''HBase Release Number'''||'''Hadoop Release Number'''||
+ ||0.1.x||0.16.x||
+ ||0.2.x||0.17.x||
+ ||0.18.x||0.18.x||
+ ||0.19.x||0.19.x||
+ ||0.20.x||0.20.x||
+ =

+ Releases of Hadoop can be found [[http://hadoop.apache.org/core/releases.=
html|here]].  We recommend using the most recent version of Hadoop possible=
, as it will contain the most bug fixes.
+ =

+ Note that HBase-0.2.x can be made to work on Hadoop-0.18.x.  HBase-0.2.x =
ships with Hadoop-0.17.x, so to use Hadoop-0.18.x you must recompile Hadoop=
-0.18.x, remove the Hadoop-0.17.x jars from HBase, and replace them with th=
e jars from Hadoop-0.18.x.
+ =

+ Also note that after HBase-0.2.x, the HBase release numbering schema will=
 change to align with the Hadoop release number on which it depends.
+ =

+ =

+ '''18. <<Anchor(18)>> Are there any Schema Design examples?'''
+ =

+ See [[http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-=
studies|HBase Schema Design -- Case Studies]] by Evan(Qingyan) Liu or the f=
ollowing text taken from Jonathan Gray's mailing list posts.
+ =

+ - There's a very big difference between storage of relational/row-oriente=
d databases and column-oriented databases. For example, if I have a table o=
f 'users' and I need to store friendships between these users... In a relat=
ional database my design is something like:
+ =

+ Table: users(pkey =3D userid) Table: friendships(userid,friendid,...) whi=
ch contains one (or maybe two depending on how it's impelemented) row for e=
ach friendship.
+ =

+ In order to lookup a given users friend, SELECT * FROM friendships WHERE =
userid =3D 'myid';
+ =

+ The cost of this relational query continues to increase as a user adds mo=
re friends. You also begin to have practical limits. If I have millions of =
users, each with many thousands of potential friends, the size of these ind=
exes grow exponentially and things get nasty quickly. Rather than friendshi=
ps, imagine I'm storing activity logs of actions taken by users.
+ =

+ In a column-oriented database these things scale continuously with minima=
l difference between 10 users and 10,000,000 users, 10 friendships and 10,0=
00 friendships.
+ =

+ Rather than a friendships table, you could just have a friendships column=
 family in the users table. Each column in that family would contain the ID=
 of a friend. The value could store anything else you would have stored in =
the friendships table in the relational model. As column families are store=
d together/sequentially on a per-row basis, reading a user with 1 friend ve=
rsus a user with 10,000 friends is virtually the same. The biggest differen=
ce is just in the shipping of this information across the network which is =
unavoidable. In this system a user could have 10,000,000 friends. In a rela=
tional database the size of the friendship table would grow massively and t=
he indexes would be out of control.
+ =

+ '''Q: Can you please provide an example of "good de-normalization" in HBa=
se and how its held consistent (in your friends example in a relational db,=
 there would be a cascadingDelete)? As I think of the users table: if I del=
ete an user with the userid=3D'123', do I have to walk through all of the o=
ther users column-family "friends" to guaranty consistency?! Is de-normaliz=
ation in HBase only used to avoid joins? Our webapp doesn't use joins at th=
e moment anyway.'''
+ =

+ You lose any concept of foreign keys. You have a primary key, that's it. =
No
+ secondary keys/indexes, no foreign keys.
+ =

+ It's the responsibility of your application to handle something like dele=
ting a friend and cascading to the friendships. Again, typical small web ap=
ps are far simpler to write using SQL, you become responsible for some of t=
he things that were once handled for you.
+ =

+ Another example of "good denormalization" would be something like storing=
 a users "favorite pages". If we want to query this data in two ways: for a=
 given user, all of his favorites. Or, for a given favorite, all of the use=
rs who have it as a favorite. Relational database would probably have table=
s for users, favorites, and userfavorites. Each link would be stored in one=
 row in the userfavorites table. We would have indexes on both 'userid' and=
 'favoriteid' and could thus query it in both ways described above. In HBas=
e we'd probably put a column in both the users table and the favorites tabl=
e, there would be no link table.
+ =

+ That would be a very efficient query in both architectures, with relation=
al performing better much better with small datasets but less so with a lar=
ge dataset.
+ =

+ Now asking for the favorites of these 10 users. That starts to get tricky=
 in HBase and will undoubtedly suffer worse from random reading. The flexib=
ility of SQL allows us to just ask the database for the answer to that ques=
tion. In a
+ small dataset it will come up with a decent solution, and return the resu=
lts to you in a matter of milliseconds. Now let's make that userfavorites t=
able a few billion rows, and the number of users you're asking for a couple=
 thousand. The query planner will come up with something but things will fa=
ll down and it will end up taking forever. The worst problem will be in the=
 index bloat. Insertions to this link table will start to take a very long =
time. HBase will perform virtually the same as it did on the small table, i=
f not better because of superior region distribution.
+ =

+ '''Q:[Michael Dagaev] How would you design an Hbase table for many-to-man=
y association between two entities, for example Student and Course?'''
+ =

+ I would define two tables:
+ =

+ Student: student id student data (name, address, ...) courses (use course=
 ids as column qualifiers here)
+ Course: course id course data (name, syllabus, ...) students (use student=
 ids as column qualifiers here)
+ =

+ Does it make sense? =

+ =

+ A[Jonathan Gray] : =

+ Your design does make sense.
+ =

+ As you said, you'd probably have two column-families in each of the Stude=
nt and Course tables. One for the data, another with a column per student o=
r course.
+ For example, a student row might look like:
+ Student :
+ id/row/key =3D 1001 =

+ data:name =3D Student Name =

+ data:address =3D 123 ABC St =

+ courses:2001 =3D (If you need more information about this association, fo=
r example, if they are on the waiting list) =

+ courses:2002 =3D ...
+ =

+ This schema gives you fast access to the queries, show all classes for a =
student (student table, courses family), or all students for a class (cours=
es table, students family). =

+=20