hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: Millions of photos into Hbase
Date Tue, 21 Sep 2010 04:44:08 GMT
20GB+?, hmmm..... I do plan to run 50 regionserver nodes though, with
3 GB Heap likely, this should be plenty to rip through say, 350TB of
data.

-Jack

On Mon, Sep 20, 2010 at 9:39 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> yes that is the new ZK based coordination.  when i publish the SU code
> we have a patch which limits that and is faster.  2GB is a little
> small for a regionserver memory... in my ideal world we'll be putting
> 20GB+ of ram to regionserver.
>
> I just figured you were using the DEB/RPMs because your files were in
> /usr/local... I usually run everything out of /home/hadoop b/c it
> allows me to easily rsync as user hadoop.
>
> but you are on the right track yes :-)
>
> On Mon, Sep 20, 2010 at 9:32 PM, Jack Levin <magnito@gmail.com> wrote:
>> Who said anything about deb :). I do use tarballs.... Yes, so what did
>> it is the copy of that jar to under hbase/lib, and then full restart.
>>  Now here is a funny thing, the master shuddered for about 10 minutes,
>> spewing those messages:
>>
>> 2010-09-20 21:23:45,826 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeCreated with state SyncConnected with path
>> /hbase/UNASSIGNED/97999366
>> 2010-09-20 21:23:45,827 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeCreated with path /hbase/UNASSIGNED/97999366
>> 2010-09-20 21:23:45,827 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeCreated state:SyncConnected
>> path:/hbase/UNASSIGNED/97999366
>> 2010-09-20 21:23:45,827 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>> UNASSIGNED zNode img15,normal052q.jpg,1285001686282.97999366 in state
>> M2ZK_REGION_OFFLINE
>> 2010-09-20 21:23:45,828 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation:
>> img13,p1000319tq.jpg,1284952655960.812544765 open on
>> 10.103.2.3,60020,1285042333293
>> 2010-09-20 21:23:45,828 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>> M2ZK_REGION_OFFLINE ] for region 97999366
>> 2010-09-20 21:23:45,828 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeChildrenChanged with state SyncConnected with path
>> /hbase/UNASSIGNED
>> 2010-09-20 21:23:45,828 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeChildrenChanged with path /hbase/UNASSIGNED
>> 2010-09-20 21:23:45,828 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeChildrenChanged state:SyncConnected
>> path:/hbase/UNASSIGNED
>> 2010-09-20 21:23:45,830 DEBUG
>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>> img150,,1284859678248.3116007 is not valid;
>> serverAddress=10.103.2.1:60020, startCode=1285038205920 unknown.
>>
>>
>> Does anyone know what they mean?   At first it would kill one of my
>> datanodes.  But what helped is when I changed to heap size to 4GB for
>> master and 2GB for datanode that was dying, and after 10 minutes I got
>> into a clean state.
>>
>> -Jack
>>
>>
>> On Mon, Sep 20, 2010 at 9:28 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>> yes, on every single machine as well, and restart.
>>>
>>> again, not sure how how you'd do this in a scalable manner with your
>>> deb packages... on the source tarball you can just replace it, rsync
>>> it out and done.
>>>
>>> :-)
>>>
>>> On Mon, Sep 20, 2010 at 8:56 PM, Jack Levin <magnito@gmail.com> wrote:
>>>> ok, I found that file, do I replace hadoop-core.*.jar under /usr/lib/hbase/lib?
>>>> Then restart, etc?  All regionservers too?
>>>>
>>>> -Jack
>>>>
>>>> On Mon, Sep 20, 2010 at 8:40 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>>> Well I don't really run CDH, I disagree with their rpm/deb packaging
>>>>> policies and I have to highly recommend not using DEBs to install
>>>>> software...
>>>>>
>>>>> So normally installing from tarball, the jar is in
>>>>> <installpath>/hadoop-0.20.0-320/hadoop-core-0.20.2+320.jar
>>>>>
>>>>> On CDH/DEB edition, it's somewhere silly ... locate and find will be
>>>>> your friend.  It should be called hadoop-core-0.20.2+320.jar though!
>>>>>
>>>>> I'm working on a github publish of SU's production system, which uses
>>>>> the cloudera maven repo to install the correct JAR in hbase so when
>>>>> you type 'mvn assembly:assembly' to build your own hbase-*-bin.tar.gz
>>>>> (the * being whatever version you specified in pom.xml) the cdh3b2 jar
>>>>> comes pre-packaged.
>>>>>
>>>>> Stay tuned :-)
>>>>>
>>>>> -ryan
>>>>>
>>>>> On Mon, Sep 20, 2010 at 8:36 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>> Ryan, hadoop jar, what is the usual path to the file? I just to to
be
>>>>>> sure, and where do I put it?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>> On Mon, Sep 20, 2010 at 8:30 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>> you need 2 more things:
>>>>>>>
>>>>>>> - restart hdfs
>>>>>>> - make sure the hadoop jar from your install replaces the one
we ship with
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 20, 2010 at 8:22 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>> So, I switched to 0.89, and we already had CDH3
>>>>>>>> (hadoop-0.20-datanode-0.20.2+320-3.noarch), even though I
added
>>>>>>>>  <name>dfs.support.append</name> as true to
both hdfs-site.xml and
>>>>>>>> hbase-site.xml, the master still reports this:
>>>>>>>>
>>>>>>>>  You are currently running the HMaster without HDFS append
support
>>>>>>>> enabled. This may result in data loss. Please see the HBase
wiki  for
>>>>>>>> details.
>>>>>>>> Master Attributes
>>>>>>>> Attribute Name  Value   Description
>>>>>>>> HBase Version   0.89.20100726, r979826  HBase version and
svn revision
>>>>>>>> HBase Compiled  Sat Jul 31 02:01:58 PDT 2010, stack    
When HBase version
>>>>>>>> was compiled and by whom
>>>>>>>> Hadoop Version  0.20.2, r911707 Hadoop version and svn revision
>>>>>>>> Hadoop Compiled Fri Feb 19 08:07:34 UTC 2010, chrisdo  
When Hadoop
>>>>>>>> version was compiled and by whom
>>>>>>>> HBase Root Directory    hdfs://namenode-rd.imageshack.us:9000/hbase
    Location
>>>>>>>> of HBase home directory
>>>>>>>>
>>>>>>>> Any ideas whats wrong?
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 20, 2010 at 5:47 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> There is actually only 1 active branch of hbase, that
being the 0.89
>>>>>>>>> release, which is based on 'trunk'.  We have snapshotted
a series of
>>>>>>>>> 0.89 "developer releases" in hopes that people would
try them our and
>>>>>>>>> start thinking about the next major version.  One of
these is what SU
>>>>>>>>> is running prod on.
>>>>>>>>>
>>>>>>>>> At this point tracking 0.89 and which ones are the 'best'
peach sets
>>>>>>>>> to run is a bit of a contact sport, but if you are serious
about not
>>>>>>>>> losing data it is worthwhile.  SU is based on the most
recent DR with
>>>>>>>>> a few minor patches of our own concoction brought in.
 If current
>>>>>>>>> works, but some Master ops are slow, and there are a
few patches on
>>>>>>>>> top of that.  I'll poke about and see if its possible
to publish to a
>>>>>>>>> github branch or something.
>>>>>>>>>
>>>>>>>>> -ryan
>>>>>>>>>
>>>>>>>>> On Mon, Sep 20, 2010 at 5:16 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>>> Sounds, good, only reason I ask is because of this:
>>>>>>>>>>
>>>>>>>>>> There are currently two active branches of HBase:
>>>>>>>>>>
>>>>>>>>>>    * 0.20 - the current stable release series,
being maintained with
>>>>>>>>>> patches for bug fixes only. This release series does
not support HDFS
>>>>>>>>>> durability - edits may be lost in the case of node
failure.
>>>>>>>>>>    * 0.89 - a development release series with active
feature and
>>>>>>>>>> stability development, not currently recommended
for production use.
>>>>>>>>>> This release does support HDFS durability - cases
in which edits are
>>>>>>>>>> lost are considered serious bugs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are we talking about data loss in case of datanode
going down while
>>>>>>>>>> being written to, or RegionServer going down?
>>>>>>>>>>
>>>>>>>>>> -jack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 20, 2010 at 4:09 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>>>>>> We run 0.89 in production @ Stumbleupon.  We
also employ 3 committers...
>>>>>>>>>>>
>>>>>>>>>>> As for safety, you have no choice but to run
0.89.  If you run a 0.20
>>>>>>>>>>> release you will lose data.  you must be on
0.89 and
>>>>>>>>>>> CDH3/append-branch to achieve data durability,
and there really is no
>>>>>>>>>>> argument around it.  If you are doing your tests
with 0.20.6 now, I'd
>>>>>>>>>>> stop and rebase those tests onto the latest DR
announced on the list.
>>>>>>>>>>>
>>>>>>>>>>> -ryan
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 20, 2010 at 3:17 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>>>>> Hi Stack, see inline:
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Sep 20, 2010 at 2:42 PM, Stack <stack@duboce.net>
wrote:
>>>>>>>>>>>>> Hey Jack:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for writing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> See below for some comments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Sep 20, 2010 at 11:00 AM, Jack
Levin <magnito@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Image-Shack gets close to two million
image uploads per day, which are
>>>>>>>>>>>>>> usually stored on regular servers
(we have about 700), as regular
>>>>>>>>>>>>>> files, and each server has its own
host name, such as (img55).   I've
>>>>>>>>>>>>>> been researching on how to improve
our backend design in terms of data
>>>>>>>>>>>>>> safety and stumped onto the Hbase
project.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any other requirements other than data
safety? (latency, etc).
>>>>>>>>>>>>
>>>>>>>>>>>> Latency is the second requirement.  We have
some services that are
>>>>>>>>>>>> very short tail, and can produce 95% cache
hit rate, so I assume this
>>>>>>>>>>>> would really put cache into good use.  Some
other services however,
>>>>>>>>>>>> have about 25% cache hit ratio, in which
case the latency should be
>>>>>>>>>>>> 'adequate', e.g. if its slightly worse than
getting data off raw disk,
>>>>>>>>>>>> then its good enough.   Safely is supremely
important, then its
>>>>>>>>>>>> availability, then speed.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Now, I think hbase is he most beautiful
thing that happen to
>>>>>>>>>>>>>> distributed DB world :).   The idea
is to store image files (about
>>>>>>>>>>>>>> 400Kb on average into HBASE).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd guess some images are much bigger
than this.  Do you ever limit
>>>>>>>>>>>>> the size of images folks can upload to
your service?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The setup will include the following
>>>>>>>>>>>>>> configuration:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 50 servers total (2 datacenters),
with 8 GB RAM, dual core cpu, 6 x
>>>>>>>>>>>>>> 2TB disks each.
>>>>>>>>>>>>>> 3 to 5 Zookeepers
>>>>>>>>>>>>>> 2 Masters (in a datacenter each)
>>>>>>>>>>>>>> 10 to 20 Stargate REST instances
(one per server, hash loadbalanced)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Whats your frontend?  Why REST?  It
might be more efficient if you
>>>>>>>>>>>>> could run with thrift given REST base64s
its payload IIRC (check the
>>>>>>>>>>>>> src yourself).
>>>>>>>>>>>>
>>>>>>>>>>>> For insertion we use Haproxy, and balance
curl PUTs across multiple REST APIs.
>>>>>>>>>>>> For reading, its a nginx proxy that does
Content-type modification
>>>>>>>>>>>> from image/jpeg to octet-stream, and vice
versa,
>>>>>>>>>>>> it then hits Haproxy again, which hits balanced
REST.
>>>>>>>>>>>> Why REST, it was the simplest thing to run,
given that its supports
>>>>>>>>>>>> HTTP, potentially we could rewrite something
for thrift, as long as we
>>>>>>>>>>>> can use http still to send and receive data
(anyone wrote anything
>>>>>>>>>>>> like that say in python, C or java?)
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 40 to 50 RegionServers (will probably
keep masters separate on dedicated boxes).
>>>>>>>>>>>>>> 2 Namenode servers (one backup, highly
available, will do fsimage and
>>>>>>>>>>>>>> edits snapshots also)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So far I got about 13 servers running,
and doing about 20 insertions /
>>>>>>>>>>>>>> second (file size ranging from few
KB to 2-3MB, ave. 400KB). via
>>>>>>>>>>>>>> Stargate API.  Our frontend servers
receive files, and I just
>>>>>>>>>>>>>> fork-insert them into stargate via
http (curl).
>>>>>>>>>>>>>> The inserts are humming along nicely,
without any noticeable load on
>>>>>>>>>>>>>> regionservers, so far inserted about
2 TB worth of images.
>>>>>>>>>>>>>> I have adjusted the region file size
to be 512MB, and table block size
>>>>>>>>>>>>>> to about 400KB , trying to match
average access block to limit HDFS
>>>>>>>>>>>>>> trips.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As Todd suggests, I'd go up from 512MB...
1G at least.  You'll
>>>>>>>>>>>>> probably want to up your flush size from
64MB to 128MB or maybe 192MB.
>>>>>>>>>>>>
>>>>>>>>>>>> Yep, i will adjust to 1G.  I thought flush
was controlled by a
>>>>>>>>>>>> function of memstore HEAP, something like
40%?  Or are you talking
>>>>>>>>>>>> about HDFS block size?
>>>>>>>>>>>>
>>>>>>>>>>>>>  So far the read performance was more
than adequate, and of
>>>>>>>>>>>>>> course write performance is nowhere
near capacity.
>>>>>>>>>>>>>> So right now, all newly uploaded
images go to HBASE.  But we do plan
>>>>>>>>>>>>>> to insert about 170 Million images
(about 100 days worth), which is
>>>>>>>>>>>>>> only about 64 TB, or 10% of planned
cluster size of 600TB.
>>>>>>>>>>>>>> The end goal is to have a storage
system that creates data safety,
>>>>>>>>>>>>>> e.g. system may go down but data
can not be lost.   Our Front-End
>>>>>>>>>>>>>> servers will continue to serve images
from their own file system (we
>>>>>>>>>>>>>> are serving about 16 Gbits at peak),
however should we need to bring
>>>>>>>>>>>>>> any of those down for maintenance,
we will redirect all traffic to
>>>>>>>>>>>>>> Hbase (should be no more than few
hundred Mbps), while the front end
>>>>>>>>>>>>>> server is repaired (for example having
its disk replaced), after the
>>>>>>>>>>>>>> repairs, we quickly repopulate it
with missing files, while serving
>>>>>>>>>>>>>> the missing remaining off Hbase.
>>>>>>>>>>>>>> All in all should be very interesting
project, and I am hoping not to
>>>>>>>>>>>>>> run into any snags, however, should
that happens, I am pleased to know
>>>>>>>>>>>>>> that such a great and vibrant tech
group exists that supports and uses
>>>>>>>>>>>>>> HBASE :).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We're definetly interested in how your
project progresses.  If you are
>>>>>>>>>>>>> ever up in the city, you should drop
by for a chat.
>>>>>>>>>>>>
>>>>>>>>>>>> Cool.  I'd like that.
>>>>>>>>>>>>
>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S. I'm also w/ Todd that you should
move to 0.89 and blooms.
>>>>>>>>>>>>> P.P.S I updated the wiki on stargate
REST:
>>>>>>>>>>>>> http://wiki.apache.org/hadoop/Hbase/Stargate
>>>>>>>>>>>>
>>>>>>>>>>>> Cool, I assume if we move to that it won't
kill existing meta tables,
>>>>>>>>>>>> and data?  e.g. cross compatible?
>>>>>>>>>>>> Is 0.89 ready for production environment?
>>>>>>>>>>>>
>>>>>>>>>>>> -Jack
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message