hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Millions of photos into Hbase
Date Tue, 21 Sep 2010 04:57:32 GMT
no no, 20 GB heap per node.  each node with 24-32gb ram, etc.

we cant rely on the linux buffer cache to save us, so we have to cache
in hbase ram.

:-)

-ryan

On Mon, Sep 20, 2010 at 9:44 PM, Jack Levin <magnito@gmail.com> wrote:
> 20GB+?, hmmm..... I do plan to run 50 regionserver nodes though, with
> 3 GB Heap likely, this should be plenty to rip through say, 350TB of
> data.
>
> -Jack
>
> On Mon, Sep 20, 2010 at 9:39 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> yes that is the new ZK based coordination.  when i publish the SU code
>> we have a patch which limits that and is faster.  2GB is a little
>> small for a regionserver memory... in my ideal world we'll be putting
>> 20GB+ of ram to regionserver.
>>
>> I just figured you were using the DEB/RPMs because your files were in
>> /usr/local... I usually run everything out of /home/hadoop b/c it
>> allows me to easily rsync as user hadoop.
>>
>> but you are on the right track yes :-)
>>
>> On Mon, Sep 20, 2010 at 9:32 PM, Jack Levin <magnito@gmail.com> wrote:
>>> Who said anything about deb :). I do use tarballs.... Yes, so what did
>>> it is the copy of that jar to under hbase/lib, and then full restart.
>>>  Now here is a funny thing, the master shuddered for about 10 minutes,
>>> spewing those messages:
>>>
>>> 2010-09-20 21:23:45,826 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeCreated with state SyncConnected with path
>>> /hbase/UNASSIGNED/97999366
>>> 2010-09-20 21:23:45,827 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeCreated with path /hbase/UNASSIGNED/97999366
>>> 2010-09-20 21:23:45,827 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeCreated state:SyncConnected
>>> path:/hbase/UNASSIGNED/97999366
>>> 2010-09-20 21:23:45,827 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>>> UNASSIGNED zNode img15,normal052q.jpg,1285001686282.97999366 in state
>>> M2ZK_REGION_OFFLINE
>>> 2010-09-20 21:23:45,828 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation:
>>> img13,p1000319tq.jpg,1284952655960.812544765 open on
>>> 10.103.2.3,60020,1285042333293
>>> 2010-09-20 21:23:45,828 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>>> M2ZK_REGION_OFFLINE ] for region 97999366
>>> 2010-09-20 21:23:45,828 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeChildrenChanged with state SyncConnected with path
>>> /hbase/UNASSIGNED
>>> 2010-09-20 21:23:45,828 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeChildrenChanged with path /hbase/UNASSIGNED
>>> 2010-09-20 21:23:45,828 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeChildrenChanged state:SyncConnected
>>> path:/hbase/UNASSIGNED
>>> 2010-09-20 21:23:45,830 DEBUG
>>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>>> img150,,1284859678248.3116007 is not valid;
>>> serverAddress=10.103.2.1:60020, startCode=1285038205920 unknown.
>>>
>>>
>>> Does anyone know what they mean?   At first it would kill one of my
>>> datanodes.  But what helped is when I changed to heap size to 4GB for
>>> master and 2GB for datanode that was dying, and after 10 minutes I got
>>> into a clean state.
>>>
>>> -Jack
>>>
>>>
>>> On Mon, Sep 20, 2010 at 9:28 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>> yes, on every single machine as well, and restart.
>>>>
>>>> again, not sure how how you'd do this in a scalable manner with your
>>>> deb packages... on the source tarball you can just replace it, rsync
>>>> it out and done.
>>>>
>>>> :-)
>>>>
>>>> On Mon, Sep 20, 2010 at 8:56 PM, Jack Levin <magnito@gmail.com> wrote:
>>>>> ok, I found that file, do I replace hadoop-core.*.jar under /usr/lib/hbase/lib?
>>>>> Then restart, etc?  All regionservers too?
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Mon, Sep 20, 2010 at 8:40 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>> Well I don't really run CDH, I disagree with their rpm/deb packaging
>>>>>> policies and I have to highly recommend not using DEBs to install
>>>>>> software...
>>>>>>
>>>>>> So normally installing from tarball, the jar is in
>>>>>> <installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar
>>>>>>
>>>>>> On CDH/DEB edition, it's somewhere silly ... locate and find will
be
>>>>>> your friend.  It should be called hadoop-core-0.20.2+320.jar though!
>>>>>>
>>>>>> I'm working on a github publish of SU's production system, which
uses
>>>>>> the cloudera maven repo to install the correct JAR in hbase so when
>>>>>> you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz
>>>>>> (the * being whatever version you specified in pom.xml) the cdh3b2
jar
>>>>>> comes pre-packaged.
>>>>>>
>>>>>> Stay tuned :-)
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>> Ryan, hadoop jar, what is the usual path to the file? I just
to to be
>>>>>>> sure, and where do I put it?
>>>>>>>
>>>>>>> -Jack
>>>>>>>
>>>>>>> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>>> you need 2 more things:
>>>>>>>>
>>>>>>>> - restart hdfs
>>>>>>>> - make sure the hadoop jar from your install replaces the
one we ship with
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>> So, I switched to 0.89, and we already had CDH3
>>>>>>>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though
I added
>>>>>>>>>  <name>dfs.support.append</name> as true
to both hdfs-site.xml and
>>>>>>>>> hbase-site.xml, the master still reports this:
>>>>>>>>>
>>>>>>>>>  You are currently running the HMaster without HDFS
append support
>>>>>>>>> enabled. This may result in data loss. Please see the
HBase wiki  for
>>>>>>>>> details.
>>>>>>>>> Master Attributes
>>>>>>>>> Attribute Name  Value   Description
>>>>>>>>> HBase Version   0.89.20100726, r979826  HBase version
and svn revision
>>>>>>>>> HBase Compiled  Sat Jul 31 02:01:58 PDT 2010, stack
    When HBase version
>>>>>>>>> was compiled and by whom
>>>>>>>>> Hadoop Version  0.20.2, r911707 Hadoop version and svn
revision
>>>>>>>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo
  When Hadoop
>>>>>>>>> version was compiled and by whom
>>>>>>>>> HBase Root Directory    hdfs://namenode-rd.imageshack.us:9000/hbase
    Location
>>>>>>>>> of HBase home directory
>>>>>>>>>
>>>>>>>>> Any ideas whats wrong?
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> There is actually only 1 active branch of hbase,
that being the 0.89
>>>>>>>>>> release, which is based on 'trunk'.  We have snapshotted
a series of
>>>>>>>>>> 0.89 "developer releases" in hopes that people would
try them our and
>>>>>>>>>> start thinking about the next major version.  One
of these is what SU
>>>>>>>>>> is running prod on.
>>>>>>>>>>
>>>>>>>>>> At this point tracking 0.89 and which ones are the
'best' peach sets
>>>>>>>>>> to run is a bit of a contact sport, but if you are
serious about not
>>>>>>>>>> losing data it is worthwhile.  SU is based on the
most recent DR with
>>>>>>>>>> a few minor patches of our own concoction brought
in.  If current
>>>>>>>>>> works, but some Master ops are slow, and there are
a few patches on
>>>>>>>>>> top of that.  I'll poke about and see if its possible
to publish to a
>>>>>>>>>> github branch or something.
>>>>>>>>>>
>>>>>>>>>> -ryan
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>>>> Sounds, good, only reason I ask is because of
this:
>>>>>>>>>>>
>>>>>>>>>>> There are currently two active branches of HBase:
>>>>>>>>>>>
>>>>>>>>>>>    * 0.20 - the current stable release series,
being maintained with
>>>>>>>>>>> patches for bug fixes only. This release series
does not support HDFS
>>>>>>>>>>> durability - edits may be lost in the case of
node failure.
>>>>>>>>>>>    * 0.89 - a development release series with
active feature and
>>>>>>>>>>> stability development, not currently recommended
for production use.
>>>>>>>>>>> This release does support HDFS durability - cases
in which edits are
>>>>>>>>>>> lost are considered serious bugs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Are we talking about data loss in case of datanode
going down while
>>>>>>>>>>> being written to, or RegionServer going down?
>>>>>>>>>>>
>>>>>>>>>>> -jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson
<ryanobjc@gmail.com> wrote:
>>>>>>>>>>>> We run 0.89 in production @ Stumbleupon.
 We also employ 3 committers...
>>>>>>>>>>>>
>>>>>>>>>>>> As for safety, you have no choice but to
run 0.89.  If you run a 0.20
>>>>>>>>>>>> release you will lose data.  you must be
on 0.89 and
>>>>>>>>>>>> CDH3/append-branch to achieve data durability,
and there really is no
>>>>>>>>>>>> argument around it.  If you are doing your
tests with 0.20.6 now, I'd
>>>>>>>>>>>> stop and rebase those tests onto the latest
DR announced on the list.
>>>>>>>>>>>>
>>>>>>>>>>>> -ryan
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin
<magnito@gmail.com> wrote:
>>>>>>>>>>>>> Hi Stack, see inline:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack
<stack@duboce.net> wrote:
>>>>>>>>>>>>>> Hey Jack:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for writing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> See below for some comments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM,
Jack Levin <magnito@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Image-Shack gets close to two
million image uploads per day, which are
>>>>>>>>>>>>>>> usually stored on regular servers
(we have about 700), as regular
>>>>>>>>>>>>>>> files, and each server has its
own host name, such as (img55).   I've
>>>>>>>>>>>>>>> been researching on how to improve
our backend design in terms of data
>>>>>>>>>>>>>>> safety and stumped onto the Hbase
project.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any other requirements other than
data safety? (latency, etc).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Latency is the second requirement.  We
have some services that are
>>>>>>>>>>>>> very short tail, and can produce 95%
cache hit rate, so I assume this
>>>>>>>>>>>>> would really put cache into good use.
 Some other services however,
>>>>>>>>>>>>> have about 25% cache hit ratio, in which
case the latency should be
>>>>>>>>>>>>> 'adequate', e.g. if its slightly worse
than getting data off raw disk,
>>>>>>>>>>>>> then its good enough.   Safely is supremely
important, then its
>>>>>>>>>>>>> availability, then speed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now, I think hbase is he most
beautiful thing that happen to
>>>>>>>>>>>>>>> distributed DB world :).   The
idea is to store image files (about
>>>>>>>>>>>>>>> 400Kb on average into HBASE).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd guess some images are much bigger
than this.  Do you ever limit
>>>>>>>>>>>>>> the size of images folks can upload
to your service?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The setup will include the following
>>>>>>>>>>>>>>> configuration:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 50 servers total (2 datacenters),
with 8 GB RAM, dual core cpu, 6 x
>>>>>>>>>>>>>>> 2TB disks each.
>>>>>>>>>>>>>>> 3 to 5 Zookeepers
>>>>>>>>>>>>>>> 2 Masters (in a datacenter each)
>>>>>>>>>>>>>>> 10 to 20 Stargate REST instances
(one per server, hash loadbalanced)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Whats your frontend?  Why REST?
 It might be more efficient if you
>>>>>>>>>>>>>> could run with thrift given REST
base64s its payload IIRC (check the
>>>>>>>>>>>>>> src yourself).
>>>>>>>>>>>>>
>>>>>>>>>>>>> For insertion we use Haproxy, and balance
curl PUTs across multiple REST APIs.
>>>>>>>>>>>>> For reading, its a nginx proxy that does
Content-type modification
>>>>>>>>>>>>> from image/jpeg to octet-stream, and
vice versa,
>>>>>>>>>>>>> it then hits Haproxy again, which hits
balanced REST.
>>>>>>>>>>>>> Why REST, it was the simplest thing to
run, given that its supports
>>>>>>>>>>>>> HTTP, potentially we could rewrite something
for thrift, as long as we
>>>>>>>>>>>>> can use http still to send and receive
data (anyone wrote anything
>>>>>>>>>>>>> like that say in python, C or java?)
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 40 to 50 RegionServers (will
probably keep masters separate on dedicated boxes).
>>>>>>>>>>>>>>> 2 Namenode servers (one backup,
highly available, will do fsimage and
>>>>>>>>>>>>>>> edits snapshots also)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So far I got about 13 servers
running, and doing about 20 insertions /
>>>>>>>>>>>>>>> second (file size ranging from
few KB to 2-3MB, ave. 400KB). via
>>>>>>>>>>>>>>> Stargate API.  Our frontend
servers receive files, and I just
>>>>>>>>>>>>>>> fork-insert them into stargate
via http (curl).
>>>>>>>>>>>>>>> The inserts are humming along
nicely, without any noticeable load on
>>>>>>>>>>>>>>> regionservers, so far inserted
about 2 TB worth of images.
>>>>>>>>>>>>>>> I have adjusted the region file
size to be 512MB, and table block size
>>>>>>>>>>>>>>> to about 400KB , trying to match
average access block to limit HDFS
>>>>>>>>>>>>>>> trips.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As Todd suggests, I'd go up from
512MB... 1G at least.  You'll
>>>>>>>>>>>>>> probably want to up your flush size
from 64MB to 128MB or maybe 192MB.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yep, i will adjust to 1G.  I thought
flush was controlled by a
>>>>>>>>>>>>> function of memstore HEAP, something
like 40%?  Or are you talking
>>>>>>>>>>>>> about HDFS block size?
>>>>>>>>>>>>>
>>>>>>>>>>>>>>  So far the read performance was
more than adequate, and of
>>>>>>>>>>>>>>> course write performance is nowhere
near capacity.
>>>>>>>>>>>>>>> So right now, all newly uploaded
images go to HBASE.  But we do plan
>>>>>>>>>>>>>>> to insert about 170 Million images
(about 100 days worth), which is
>>>>>>>>>>>>>>> only about 64 TB, or 10% of planned
cluster size of 600TB.
>>>>>>>>>>>>>>> The end goal is to have a storage
system that creates data safety,
>>>>>>>>>>>>>>> e.g. system may go down but data
can not be lost.   Our Front-End
>>>>>>>>>>>>>>> servers will continue to serve
images from their own file system (we
>>>>>>>>>>>>>>> are serving about 16 Gbits at
peak), however should we need to bring
>>>>>>>>>>>>>>> any of those down for maintenance,
we will redirect all traffic to
>>>>>>>>>>>>>>> Hbase (should be no more than
few hundred Mbps), while the front end
>>>>>>>>>>>>>>> server is repaired (for example
having its disk replaced), after the
>>>>>>>>>>>>>>> repairs, we quickly repopulate
it with missing files, while serving
>>>>>>>>>>>>>>> the missing remaining off Hbase.
>>>>>>>>>>>>>>> All in all should be very interesting
project, and I am hoping not to
>>>>>>>>>>>>>>> run into any snags, however,
should that happens, I am pleased to know
>>>>>>>>>>>>>>> that such a great and vibrant
tech group exists that supports and uses
>>>>>>>>>>>>>>> HBASE :).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We're definetly interested in how
your project progresses.  If you are
>>>>>>>>>>>>>> ever up in the city, you should drop
by for a chat.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cool.  I'd like that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S. I'm also w/ Todd that you should
move to 0.89 and blooms.
>>>>>>>>>>>>>> P.P.S I updated the wiki on stargate
REST:
>>>>>>>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cool, I assume if we move to that it
won't kill existing meta tables,
>>>>>>>>>>>>> and data?  e.g. cross compatible?
>>>>>>>>>>>>> Is 0.89 ready for production environment?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jack
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message