Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of magnito@gmail.com designates
 209.85.214.169 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=fs0ZhBtBboc2xTzdEB0Z0zPmsVJrexrYfTINU20SEDnRdcNf+zd1M3nnlEW4StNq7A
         rPR0Qs1XjdLJdNEARamf/03l5npKxw6G+/1eWSjuMb8lAwzlT7afjIQtBA9XvjkPIiIL
         b7n3GWE8ejlP/UjzolYDxx0kBCkLoD/xpYLsU=
MIME-Version: 1.0
In-Reply-To: <AANLkTikMbHymmvRR+KJRe7qnodo9X-6_TLSWVirwR7d7@mail.gmail.com>
References: <AANLkTi=Hr3v95As8hf+tCCf5mzsQMC8Y92rT2sBD1HhM@mail.gmail.com>
	<AANLkTimR1j7q8+2ehO38npoZOKveYDN1SXyzr_PYmTMC@mail.gmail.com>
	<AANLkTi=xYoxUy2C_v0EWs6czHGUPkiwT-Vj3jaP3Kfa7@mail.gmail.com>
	<AANLkTimkjVOG0fPCEjES850v=uuD_Q=v7ynfbSYo+iK8@mail.gmail.com>
	<AANLkTimKrj8gZBAzKSa3bT8Vu01N5rDcpE62vhtAwr+R@mail.gmail.com>
	<AANLkTimXGj0HovoMfsgH2-DPyxNjcKRZSv_6GDbNyEwO@mail.gmail.com>
	<AANLkTimcB6rP+3CsXQVLaJGrwdi060ye5+K4DsY9PO1E@mail.gmail.com>
	<AANLkTikOh8Pv_MU8W9N_zUMvqCcL0BM9_6oep21mwf1H@mail.gmail.com>
	<AANLkTi=S6kJE+pYYNfNBGzGNogS5OoVFNAjctQETSVw0@mail.gmail.com>
	<AANLkTik+-_8LEAv5D+fQT+gqEUnYDXWugsa1Bo60SaJY@mail.gmail.com>
	<AANLkTimwOWCsoEYdUBMnZOxG7TDYPGsRRTEDRvYgu-jU@mail.gmail.com>
	<AANLkTikMbHymmvRR+KJRe7qnodo9X-6_TLSWVirwR7d7@mail.gmail.com>
Date: Mon, 20 Sep 2010 21:32:13 -0700
Message-ID: <AANLkTinE1DfpHKwevAhXy81F=Yfpkx+Vfmy60NeE03Pu@mail.gmail.com>
Subject: Re: Millions of photos into Hbase
From: Jack Levin <magnito@gmail.com>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Who said anything about deb :). I do use tarballs.... Yes, so what did
it is the copy of that jar to under hbase/lib, and then full restart.
 Now here is a funny thing, the master shuddered for about 10 minutes,
spewing those messages:

2010-09-20 21:23:45,826 DEBUG org.apache.hadoop.hbase.master.HMaster:
Event NodeCreated with state SyncConnected with path
/hbase/UNASSIGNED/97999366
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
NodeCreated with path /hbase/UNASSIGNED/97999366
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
Got zkEvent NodeCreated state:SyncConnected
path:/hbase/UNASSIGNED/97999366
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Created/updated
UNASSIGNED zNode img15,normal052q.jpg,1285001686282.97999366 in state
M2ZK_REGION_OFFLINE
2010-09-20 21:23:45,828 INFO
org.apache.hadoop.hbase.master.RegionServerOperation:
img13,p1000319tq.jpg,1284952655960.812544765 open on
10.103.2.3,60020,1285042333293
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
M2ZK_REGION_OFFLINE ] for region 97999366
2010-09-20 21:23:45,828 DEBUG org.apache.hadoop.hbase.master.HMaster:
Event NodeChildrenChanged with state SyncConnected with path
/hbase/UNASSIGNED
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
NodeChildrenChanged with path /hbase/UNASSIGNED
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
Got zkEvent NodeChildrenChanged state:SyncConnected
path:/hbase/UNASSIGNED
2010-09-20 21:23:45,830 DEBUG
org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
img150,,1284859678248.3116007 is not valid;
serverAddress=3D10.103.2.1:60020, startCode=3D1285038205920 unknown.


Does anyone know what they mean?   At first it would kill one of my
datanodes.  But what helped is when I changed to heap size to 4GB for
master and 2GB for datanode that was dying, and after 10 minutes I got
into a clean state.

-Jack


On Mon, Sep 20, 2010 at 9:28 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> yes, on every single machine as well, and restart.
>
> again, not sure how how you'd do this in a scalable manner with your
> deb packages... on the source tarball you can just replace it, rsync
> it out and done.
>
> :-)
>
> On Mon, Sep 20, 2010 at 8:56 PM, Jack Levin <magnito@gmail.com> wrote:
>> ok, I found that file, do I replace hadoop-core.*.jar under /usr/lib/hba=
se/lib?
>> Then restart, etc? =A0All regionservers too?
>>
>> -Jack
>>
>> On Mon, Sep 20, 2010 at 8:40 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>> Well I don't really run CDH, I disagree with their rpm/deb packaging
>>> policies and I have to highly recommend not using DEBs to install
>>> software...
>>>
>>> So normally installing from tarball, the jar is in
>>> <installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar
>>>
>>> On CDH/DEB edition, it's somewhere silly ... locate and find will be
>>> your friend. =A0It should be called hadoop-core-0.20.2+320.jar though!
>>>
>>> I'm working on a github publish of SU's production system, which uses
>>> the cloudera maven repo to install the correct JAR in hbase so when
>>> you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz
>>> (the * being whatever version you specified in pom.xml) the cdh3b2 jar
>>> comes pre-packaged.
>>>
>>> Stay tuned :-)
>>>
>>> -ryan
>>>
>>> On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <magnito@gmail.com> wrote:
>>>> Ryan, hadoop jar, what is the usual path to the file? I just to to be
>>>> sure, and where do I put it?
>>>>
>>>> -Jack
>>>>
>>>> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <ryanobjc@gmail.com> wrot=
e:
>>>>> you need 2 more things:
>>>>>
>>>>> - restart hdfs
>>>>> - make sure the hadoop jar from your install replaces the one we ship=
 with
>>>>>
>>>>>
>>>>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <magnito@gmail.com> wrote=
:
>>>>>> So, I switched to 0.89, and we already had CDH3
>>>>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though I added
>>>>>> =A0<name>dfs.support.append</name> as true to both hdfs-site.xml and
>>>>>> hbase-site.xml, the master still reports this:
>>>>>>
>>>>>> =A0You are currently running the HMaster without HDFS append support
>>>>>> enabled. This may result in data loss. Please see the HBase wiki =A0=
for
>>>>>> details.
>>>>>> Master Attributes
>>>>>> Attribute Name =A0Value =A0 Description
>>>>>> HBase Version =A0 0.89.20100726, r979826 =A0HBase version and svn re=
vision
>>>>>> HBase Compiled =A0Sat Jul 31 02:01:58 PDT 2010, stack =A0 =A0 When H=
Base version
>>>>>> was compiled and by whom
>>>>>> Hadoop Version =A00.20.2, r911707 Hadoop version and svn revision
>>>>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo =A0 When Hadoo=
p
>>>>>> version was compiled and by whom
>>>>>> HBase Root Directory =A0 =A0hdfs://namenode-rd.imageshack.us:9000/hb=
ase =A0 =A0 Location
>>>>>> of HBase home directory
>>>>>>
>>>>>> Any ideas whats wrong?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <ryanobjc@gmail.com> wr=
ote:
>>>>>>> Hey,
>>>>>>>
>>>>>>> There is actually only 1 active branch of hbase, that being the 0.8=
9
>>>>>>> release, which is based on 'trunk'. =A0We have snapshotted a series=
 of
>>>>>>> 0.89 "developer releases" in hopes that people would try them our a=
nd
>>>>>>> start thinking about the next major version. =A0One of these is wha=
t SU
>>>>>>> is running prod on.
>>>>>>>
>>>>>>> At this point tracking 0.89 and which ones are the 'best' peach set=
s
>>>>>>> to run is a bit of a contact sport, but if you are serious about no=
t
>>>>>>> losing data it is worthwhile. =A0SU is based on the most recent DR =
with
>>>>>>> a few minor patches of our own concoction brought in. =A0If current
>>>>>>> works, but some Master ops are slow, and there are a few patches on
>>>>>>> top of that. =A0I'll poke about and see if its possible to publish =
to a
>>>>>>> github branch or something.
>>>>>>>
>>>>>>> -ryan
>>>>>>>
>>>>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <magnito@gmail.com> wro=
te:
>>>>>>>> Sounds, good, only reason I ask is because of this:
>>>>>>>>
>>>>>>>> There are currently two active branches of HBase:
>>>>>>>>
>>>>>>>> =A0 =A0* 0.20 - the current stable release series, being maintaine=
d with
>>>>>>>> patches for bug fixes only. This release series does not support H=
DFS
>>>>>>>> durability - edits may be lost in the case of node failure.
>>>>>>>> =A0 =A0* 0.89 - a development release series with active feature a=
nd
>>>>>>>> stability development, not currently recommended for production us=
e.
>>>>>>>> This release does support HDFS durability - cases in which edits a=
re
>>>>>>>> lost are considered serious bugs.
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Are we talking about data loss in case of datanode going down whil=
e
>>>>>>>> being written to, or RegionServer going down?
>>>>>>>>
>>>>>>>> -jack
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson <ryanobjc@gmail.com> =
wrote:
>>>>>>>>> We run 0.89 in production @ Stumbleupon. =A0We also employ 3 comm=
itters...
>>>>>>>>>
>>>>>>>>> As for safety, you have no choice but to run 0.89. =A0If you run =
a 0.20
>>>>>>>>> release you will lose data. =A0you must be on 0.89 and
>>>>>>>>> CDH3/append-branch to achieve data durability, and there really i=
s no
>>>>>>>>> argument around it. =A0If you are doing your tests with 0.20.6 no=
w, I'd
>>>>>>>>> stop and rebase those tests onto the latest DR announced on the l=
ist.
>>>>>>>>>
>>>>>>>>> -ryan
>>>>>>>>>
>>>>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin <magnito@gmail.com> w=
rote:
>>>>>>>>>> Hi Stack, see inline:
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack <stack@duboce.net> wrote:
>>>>>>>>>>> Hey Jack:
>>>>>>>>>>>
>>>>>>>>>>> Thanks for writing.
>>>>>>>>>>>
>>>>>>>>>>> See below for some comments.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM, Jack Levin <magnito@gmail.com=
> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Image-Shack gets close to two million image uploads per day, w=
hich are
>>>>>>>>>>>> usually stored on regular servers (we have about 700), as regu=
lar
>>>>>>>>>>>> files, and each server has its own host name, such as (img55).=
 =A0 I've
>>>>>>>>>>>> been researching on how to improve our backend design in terms=
 of data
>>>>>>>>>>>> safety and stumped onto the Hbase project.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any other requirements other than data safety? (latency, etc).
>>>>>>>>>>
>>>>>>>>>> Latency is the second requirement. =A0We have some services that=
 are
>>>>>>>>>> very short tail, and can produce 95% cache hit rate, so I assume=
 this
>>>>>>>>>> would really put cache into good use. =A0Some other services how=
ever,
>>>>>>>>>> have about 25% cache hit ratio, in which case the latency should=
 be
>>>>>>>>>> 'adequate', e.g. if its slightly worse than getting data off raw=
 disk,
>>>>>>>>>> then its good enough. =A0 Safely is supremely important, then it=
s
>>>>>>>>>> availability, then speed.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> Now, I think hbase is he most beautiful thing that happen to
>>>>>>>>>>>> distributed DB world :). =A0 The idea is to store image files =
(about
>>>>>>>>>>>> 400Kb on average into HBASE).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'd guess some images are much bigger than this. =A0Do you ever=
 limit
>>>>>>>>>>> the size of images folks can upload to your service?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The setup will include the following
>>>>>>>>>>>> configuration:
>>>>>>>>>>>>
>>>>>>>>>>>> 50 servers total (2 datacenters), with 8 GB RAM, dual core cpu=
, 6 x
>>>>>>>>>>>> 2TB disks each.
>>>>>>>>>>>> 3 to 5 Zookeepers
>>>>>>>>>>>> 2 Masters (in a datacenter each)
>>>>>>>>>>>> 10 to 20 Stargate REST instances (one per server, hash loadbal=
anced)
>>>>>>>>>>>
>>>>>>>>>>> Whats your frontend? =A0Why REST? =A0It might be more efficient=
 if you
>>>>>>>>>>> could run with thrift given REST base64s its payload IIRC (chec=
k the
>>>>>>>>>>> src yourself).
>>>>>>>>>>
>>>>>>>>>> For insertion we use Haproxy, and balance curl PUTs across multi=
ple REST APIs.
>>>>>>>>>> For reading, its a nginx proxy that does Content-type modificati=
on
>>>>>>>>>> from image/jpeg to octet-stream, and vice versa,
>>>>>>>>>> it then hits Haproxy again, which hits balanced REST.
>>>>>>>>>> Why REST, it was the simplest thing to run, given that its suppo=
rts
>>>>>>>>>> HTTP, potentially we could rewrite something for thrift, as long=
 as we
>>>>>>>>>> can use http still to send and receive data (anyone wrote anythi=
ng
>>>>>>>>>> like that say in python, C or java?)
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 40 to 50 RegionServers (will probably keep masters separate on=
 dedicated boxes).
>>>>>>>>>>>> 2 Namenode servers (one backup, highly available, will do fsim=
age and
>>>>>>>>>>>> edits snapshots also)
>>>>>>>>>>>>
>>>>>>>>>>>> So far I got about 13 servers running, and doing about 20 inse=
rtions /
>>>>>>>>>>>> second (file size ranging from few KB to 2-3MB, ave. 400KB). v=
ia
>>>>>>>>>>>> Stargate API. =A0Our frontend servers receive files, and I jus=
t
>>>>>>>>>>>> fork-insert them into stargate via http (curl).
>>>>>>>>>>>> The inserts are humming along nicely, without any noticeable l=
oad on
>>>>>>>>>>>> regionservers, so far inserted about 2 TB worth of images.
>>>>>>>>>>>> I have adjusted the region file size to be 512MB, and table bl=
ock size
>>>>>>>>>>>> to about 400KB , trying to match average access block to limit=
 HDFS
>>>>>>>>>>>> trips.
>>>>>>>>>>>
>>>>>>>>>>> As Todd suggests, I'd go up from 512MB... 1G at least. =A0You'l=
l
>>>>>>>>>>> probably want to up your flush size from 64MB to 128MB or maybe=
 192MB.
>>>>>>>>>>
>>>>>>>>>> Yep, i will adjust to 1G. =A0I thought flush was controlled by a
>>>>>>>>>> function of memstore HEAP, something like 40%? =A0Or are you tal=
king
>>>>>>>>>> about HDFS block size?
>>>>>>>>>>
>>>>>>>>>>> =A0So far the read performance was more than adequate, and of
>>>>>>>>>>>> course write performance is nowhere near capacity.
>>>>>>>>>>>> So right now, all newly uploaded images go to HBASE. =A0But we=
 do plan
>>>>>>>>>>>> to insert about 170 Million images (about 100 days worth), whi=
ch is
>>>>>>>>>>>> only about 64 TB, or 10% of planned cluster size of 600TB.
>>>>>>>>>>>> The end goal is to have a storage system that creates data saf=
ety,
>>>>>>>>>>>> e.g. system may go down but data can not be lost. =A0 Our Fron=
t-End
>>>>>>>>>>>> servers will continue to serve images from their own file syst=
em (we
>>>>>>>>>>>> are serving about 16 Gbits at peak), however should we need to=
 bring
>>>>>>>>>>>> any of those down for maintenance, we will redirect all traffi=
c to
>>>>>>>>>>>> Hbase (should be no more than few hundred Mbps), while the fro=
nt end
>>>>>>>>>>>> server is repaired (for example having its disk replaced), aft=
er the
>>>>>>>>>>>> repairs, we quickly repopulate it with missing files, while se=
rving
>>>>>>>>>>>> the missing remaining off Hbase.
>>>>>>>>>>>> All in all should be very interesting project, and I am hoping=
 not to
>>>>>>>>>>>> run into any snags, however, should that happens, I am pleased=
 to know
>>>>>>>>>>>> that such a great and vibrant tech group exists that supports =
and uses
>>>>>>>>>>>> HBASE :).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We're definetly interested in how your project progresses. =A0I=
f you are
>>>>>>>>>>> ever up in the city, you should drop by for a chat.
>>>>>>>>>>
>>>>>>>>>> Cool. =A0I'd like that.
>>>>>>>>>>
>>>>>>>>>>> St.Ack
>>>>>>>>>>>
>>>>>>>>>>> P.S. I'm also w/ Todd that you should move to 0.89 and blooms.
>>>>>>>>>>> P.P.S I updated the wiki on stargate REST:
>>>>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate
>>>>>>>>>>
>>>>>>>>>> Cool, I assume if we move to that it won't kill existing meta ta=
bles,
>>>>>>>>>> and data? =A0e.g. cross compatible?
>>>>>>>>>> Is 0.89 ready for production environment?
>>>>>>>>>>
>>>>>>>>>> -Jack
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>