hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: Millions of photos into Hbase
Date Tue, 21 Sep 2010 04:32:13 GMT
Who said anything about deb :). I do use tarballs.... Yes, so what did
it is the copy of that jar to under hbase/lib, and then full restart.
 Now here is a funny thing, the master shuddered for about 10 minutes,
spewing those messages:

2010-09-20 21:23:45,826 DEBUG org.apache.hadoop.hbase.master.HMaster:
Event NodeCreated with state SyncConnected with path
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
NodeCreated with path /hbase/UNASSIGNED/97999366
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
Got zkEvent NodeCreated state:SyncConnected
2010-09-20 21:23:45,827 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Created/updated
UNASSIGNED zNode img15,normal052q.jpg,1285001686282.97999366 in state
2010-09-20 21:23:45,828 INFO
img13,p1000319tq.jpg,1284952655960.812544765 open on,60020,1285042333293
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
M2ZK_REGION_OFFLINE ] for region 97999366
2010-09-20 21:23:45,828 DEBUG org.apache.hadoop.hbase.master.HMaster:
Event NodeChildrenChanged with state SyncConnected with path
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
NodeChildrenChanged with path /hbase/UNASSIGNED
2010-09-20 21:23:45,828 DEBUG
org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
Got zkEvent NodeChildrenChanged state:SyncConnected
2010-09-20 21:23:45,830 DEBUG
org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
img150,,1284859678248.3116007 is not valid;
serverAddress=, startCode=1285038205920 unknown.

Does anyone know what they mean?   At first it would kill one of my
datanodes.  But what helped is when I changed to heap size to 4GB for
master and 2GB for datanode that was dying, and after 10 minutes I got
into a clean state.


On Mon, Sep 20, 2010 at 9:28 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> yes, on every single machine as well, and restart.
> again, not sure how how you'd do this in a scalable manner with your
> deb packages... on the source tarball you can just replace it, rsync
> it out and done.
> :-)
> On Mon, Sep 20, 2010 at 8:56 PM, Jack Levin <magnito@gmail.com> wrote:
>> ok, I found that file, do I replace hadoop-core.*.jar under /usr/lib/hbase/lib?
>> Then restart, etc?  All regionservers too?
>> -Jack
>> On Mon, Sep 20, 2010 at 8:40 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>> Well I don't really run CDH, I disagree with their rpm/deb packaging
>>> policies and I have to highly recommend not using DEBs to install
>>> software...
>>> So normally installing from tarball, the jar is in
>>> <installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar
>>> On CDH/DEB edition, it's somewhere silly ... locate and find will be
>>> your friend.  It should be called hadoop-core-0.20.2+320.jar though!
>>> I'm working on a github publish of SU's production system, which uses
>>> the cloudera maven repo to install the correct JAR in hbase so when
>>> you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz
>>> (the * being whatever version you specified in pom.xml) the cdh3b2 jar
>>> comes pre-packaged.
>>> Stay tuned :-)
>>> -ryan
>>> On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <magnito@gmail.com> wrote:
>>>> Ryan, hadoop jar, what is the usual path to the file? I just to to be
>>>> sure, and where do I put it?
>>>> -Jack
>>>> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>>> you need 2 more things:
>>>>> - restart hdfs
>>>>> - make sure the hadoop jar from your install replaces the one we ship
>>>>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <magnito@gmail.com>
>>>>>> So, I switched to 0.89, and we already had CDH3
>>>>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though I added
>>>>>>  <name>dfs.support.append</name> as true to both hdfs-site.xml
>>>>>> hbase-site.xml, the master still reports this:
>>>>>>  You are currently running the HMaster without HDFS append support
>>>>>> enabled. This may result in data loss. Please see the HBase wiki
>>>>>> details.
>>>>>> Master Attributes
>>>>>> Attribute Name  Value   Description
>>>>>> HBase Version   0.89.20100726, r979826  HBase version and svn revision
>>>>>> HBase Compiled  Sat Jul 31 02:01:58 PDT 2010, stack     When HBase
>>>>>> was compiled and by whom
>>>>>> Hadoop Version  0.20.2, r911707 Hadoop version and svn revision
>>>>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo   When Hadoop
>>>>>> version was compiled and by whom
>>>>>> HBase Root Directory    hdfs://namenode-rd.imageshack.us:9000/hbase
>>>>>> of HBase home directory
>>>>>> Any ideas whats wrong?
>>>>>> -Jack
>>>>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <ryanobjc@gmail.com>
>>>>>>> Hey,
>>>>>>> There is actually only 1 active branch of hbase, that being the
>>>>>>> release, which is based on 'trunk'.  We have snapshotted a series
>>>>>>> 0.89 "developer releases" in hopes that people would try them
our and
>>>>>>> start thinking about the next major version.  One of these is
what SU
>>>>>>> is running prod on.
>>>>>>> At this point tracking 0.89 and which ones are the 'best' peach
>>>>>>> to run is a bit of a contact sport, but if you are serious about
>>>>>>> losing data it is worthwhile.  SU is based on the most recent
DR with
>>>>>>> a few minor patches of our own concoction brought in.  If current
>>>>>>> works, but some Master ops are slow, and there are a few patches
>>>>>>> top of that.  I'll poke about and see if its possible to publish
to a
>>>>>>> github branch or something.
>>>>>>> -ryan
>>>>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <magnito@gmail.com>
>>>>>>>> Sounds, good, only reason I ask is because of this:
>>>>>>>> There are currently two active branches of HBase:
>>>>>>>>    * 0.20 - the current stable release series, being maintained
>>>>>>>> patches for bug fixes only. This release series does not
support HDFS
>>>>>>>> durability - edits may be lost in the case of node failure.
>>>>>>>>    * 0.89 - a development release series with active feature
>>>>>>>> stability development, not currently recommended for production
>>>>>>>> This release does support HDFS durability - cases in which
edits are
>>>>>>>> lost are considered serious bugs.
>>>>>>>> Are we talking about data loss in case of datanode going
down while
>>>>>>>> being written to, or RegionServer going down?
>>>>>>>> -jack
>>>>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson <ryanobjc@gmail.com>
>>>>>>>>> We run 0.89 in production @ Stumbleupon.  We also employ
3 committers...
>>>>>>>>> As for safety, you have no choice but to run 0.89.  If
you run a 0.20
>>>>>>>>> release you will lose data.  you must be on 0.89 and
>>>>>>>>> CDH3/append-branch to achieve data durability, and there
really is no
>>>>>>>>> argument around it.  If you are doing your tests with
0.20.6 now, I'd
>>>>>>>>> stop and rebase those tests onto the latest DR announced
on the list.
>>>>>>>>> -ryan
>>>>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin <magnito@gmail.com>
>>>>>>>>>> Hi Stack, see inline:
>>>>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack <stack@duboce.net>
>>>>>>>>>>> Hey Jack:
>>>>>>>>>>> Thanks for writing.
>>>>>>>>>>> See below for some comments.
>>>>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM, Jack Levin
<magnito@gmail.com> wrote:
>>>>>>>>>>>> Image-Shack gets close to two million image
uploads per day, which are
>>>>>>>>>>>> usually stored on regular servers (we have
about 700), as regular
>>>>>>>>>>>> files, and each server has its own host name,
such as (img55).   I've
>>>>>>>>>>>> been researching on how to improve our backend
design in terms of data
>>>>>>>>>>>> safety and stumped onto the Hbase project.
>>>>>>>>>>> Any other requirements other than data safety?
(latency, etc).
>>>>>>>>>> Latency is the second requirement.  We have some
services that are
>>>>>>>>>> very short tail, and can produce 95% cache hit rate,
so I assume this
>>>>>>>>>> would really put cache into good use.  Some other
services however,
>>>>>>>>>> have about 25% cache hit ratio, in which case the
latency should be
>>>>>>>>>> 'adequate', e.g. if its slightly worse than getting
data off raw disk,
>>>>>>>>>> then its good enough.   Safely is supremely important,
then its
>>>>>>>>>> availability, then speed.
>>>>>>>>>>>> Now, I think hbase is he most beautiful thing
that happen to
>>>>>>>>>>>> distributed DB world :).   The idea is to
store image files (about
>>>>>>>>>>>> 400Kb on average into HBASE).
>>>>>>>>>>> I'd guess some images are much bigger than this.
 Do you ever limit
>>>>>>>>>>> the size of images folks can upload to your service?
>>>>>>>>>>> The setup will include the following
>>>>>>>>>>>> configuration:
>>>>>>>>>>>> 50 servers total (2 datacenters), with 8
GB RAM, dual core cpu, 6 x
>>>>>>>>>>>> 2TB disks each.
>>>>>>>>>>>> 3 to 5 Zookeepers
>>>>>>>>>>>> 2 Masters (in a datacenter each)
>>>>>>>>>>>> 10 to 20 Stargate REST instances (one per
server, hash loadbalanced)
>>>>>>>>>>> Whats your frontend?  Why REST?  It might be
more efficient if you
>>>>>>>>>>> could run with thrift given REST base64s its
payload IIRC (check the
>>>>>>>>>>> src yourself).
>>>>>>>>>> For insertion we use Haproxy, and balance curl PUTs
across multiple REST APIs.
>>>>>>>>>> For reading, its a nginx proxy that does Content-type
>>>>>>>>>> from image/jpeg to octet-stream, and vice versa,
>>>>>>>>>> it then hits Haproxy again, which hits balanced REST.
>>>>>>>>>> Why REST, it was the simplest thing to run, given
that its supports
>>>>>>>>>> HTTP, potentially we could rewrite something for
thrift, as long as we
>>>>>>>>>> can use http still to send and receive data (anyone
wrote anything
>>>>>>>>>> like that say in python, C or java?)
>>>>>>>>>>>> 40 to 50 RegionServers (will probably keep
masters separate on dedicated boxes).
>>>>>>>>>>>> 2 Namenode servers (one backup, highly available,
will do fsimage and
>>>>>>>>>>>> edits snapshots also)
>>>>>>>>>>>> So far I got about 13 servers running, and
doing about 20 insertions /
>>>>>>>>>>>> second (file size ranging from few KB to
2-3MB, ave. 400KB). via
>>>>>>>>>>>> Stargate API.  Our frontend servers receive
files, and I just
>>>>>>>>>>>> fork-insert them into stargate via http (curl).
>>>>>>>>>>>> The inserts are humming along nicely, without
any noticeable load on
>>>>>>>>>>>> regionservers, so far inserted about 2 TB
worth of images.
>>>>>>>>>>>> I have adjusted the region file size to be
512MB, and table block size
>>>>>>>>>>>> to about 400KB , trying to match average
access block to limit HDFS
>>>>>>>>>>>> trips.
>>>>>>>>>>> As Todd suggests, I'd go up from 512MB... 1G
at least.  You'll
>>>>>>>>>>> probably want to up your flush size from 64MB
to 128MB or maybe 192MB.
>>>>>>>>>> Yep, i will adjust to 1G.  I thought flush was controlled
by a
>>>>>>>>>> function of memstore HEAP, something like 40%?  Or
are you talking
>>>>>>>>>> about HDFS block size?
>>>>>>>>>>>  So far the read performance was more than adequate,
and of
>>>>>>>>>>>> course write performance is nowhere near
>>>>>>>>>>>> So right now, all newly uploaded images go
to HBASE.  But we do plan
>>>>>>>>>>>> to insert about 170 Million images (about
100 days worth), which is
>>>>>>>>>>>> only about 64 TB, or 10% of planned cluster
size of 600TB.
>>>>>>>>>>>> The end goal is to have a storage system
that creates data safety,
>>>>>>>>>>>> e.g. system may go down but data can not
be lost.   Our Front-End
>>>>>>>>>>>> servers will continue to serve images from
their own file system (we
>>>>>>>>>>>> are serving about 16 Gbits at peak), however
should we need to bring
>>>>>>>>>>>> any of those down for maintenance, we will
redirect all traffic to
>>>>>>>>>>>> Hbase (should be no more than few hundred
Mbps), while the front end
>>>>>>>>>>>> server is repaired (for example having its
disk replaced), after the
>>>>>>>>>>>> repairs, we quickly repopulate it with missing
files, while serving
>>>>>>>>>>>> the missing remaining off Hbase.
>>>>>>>>>>>> All in all should be very interesting project,
and I am hoping not to
>>>>>>>>>>>> run into any snags, however, should that
happens, I am pleased to know
>>>>>>>>>>>> that such a great and vibrant tech group
exists that supports and uses
>>>>>>>>>>>> HBASE :).
>>>>>>>>>>> We're definetly interested in how your project
progresses.  If you are
>>>>>>>>>>> ever up in the city, you should drop by for a
>>>>>>>>>> Cool.  I'd like that.
>>>>>>>>>>> St.Ack
>>>>>>>>>>> P.S. I'm also w/ Todd that you should move to
0.89 and blooms.
>>>>>>>>>>> P.P.S I updated the wiki on stargate REST:
>>>>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate
>>>>>>>>>> Cool, I assume if we move to that it won't kill existing
meta tables,
>>>>>>>>>> and data?  e.g. cross compatible?
>>>>>>>>>> Is 0.89 ready for production environment?
>>>>>>>>>> -Jack

View raw message