hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: EC2 + Thrift inserts
Date Fri, 30 Apr 2010 23:44:27 GMT
On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas <cft@email.com> wrote:
> Thank you, it is nice to get this help.
>
> I definitely understand the overhead of writing the index, although it seems much worse
than just that overhead would indicate. If I understand you correctly that is because all
inserts into an IndexedTable are synchronized on one table? If that was switched to using
an HTablePool it would no longer be as sever a bottleneck (performance is about an order of
magnitude better without the indexing)?

They are synchronized per region server yes, and it _should_ be better
with a pool since then you can do parallel inserts. Patching it
doesn't seem hard, but maybe I'm missing some finer details since I
usually don't work around that code.

>
> I'm also using thrift to connect and am wondering if that itself puts an overall limit
on scaling? It does seem that no matter how many more mappers and servers I add, even without
indexing, I am capped at about 5k rows/sec total. I'm waiting a bit as the table grows so
that it is split across more regionservers, hopefully that will help, but as far as I can
tell I am not hitting any CPU or IO constraint during my tests.

I don't understand the "I'm also using thrift" and "how many more
mappers" part, you are using Thrift inside a map? Anyways, more
clients won't help since there's a single mega serialization of all
the inserts to the index table per region server. It's normal not to
see any CPU/mem/IO contention since, in this case, it's all about the
speed at which you can process a single row insertion The rest of the
threads just wait...

>
> -chris
>
> I'm also using thrift, and while I am using the
> On Apr 30, 2010, at 3:00 PM, Jean-Daniel Cryans wrote:
>
>> The contrib packages doesn't get as much love as core HBase, so they
>> tend to be under performant and/or reliable and/or maintained and/or
>> etc. In this case the issue doesn't seem that bad since it could just
>> use a HTablePool, but using IndexedTables will definitely be slower
>> than straight insert since it writes to 2 tables (the main table and
>> the index).
>>
>> J-D
>>
>> On Fri, Apr 30, 2010 at 2:53 PM, Chris Tarnas <cft@email.com> wrote:
>>> It appears that for multiple simulations loads using the IndexTables probably
not the best choice?
>>>
>>> -chris
>>>
>>> On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote:
>>>
>>>> Yeah more handlers won't do it here since there's tons of calls
>>>> waiting on a single synchronized method, I guess the IndexedRegion
>>>> should use a pool of HTables instead of a single one in order to
>>>> improve indexation throughput.
>>>>
>>>> J-D
>>>>
>>>> On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas <cft@email.com> wrote:
>>>>> Here is the thread dump:
>>>>>
>>>>> I cranked up the handlers to 300 just in case and ran 40 mappers that
loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper
with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I
throw the total number of rows/sec doesn't go much above 700 rows/second total, which seems
very, very slow to me.
>>>>>
>>>>> Here is the thread dump from a node:
>>>>>
>>>>> http://pastebin.com/U3eLRdMV
>>>>>
>>>>> I do see quite a bit of waiting and some blocking in there, not sure
how exactly to interpret it all though.
>>>>>
>>>>> thanks for any help!
>>>>> -chris
>>>>>
>>>>> On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:
>>>>>
>>>>>> One thing to check is at the peak of your load, run jstack on one
of
>>>>>> the regionservers, and look at the handler threads - if all of them
>>>>>> are doing something you might be running into handler contention.
>>>>>>
>>>>>> it is basically ultimately IO bound.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Thu, Apr 29, 2010 at 9:12 PM, Chris Tarnas <cft@email.com>
wrote:
>>>>>>> They are all at 100, but none of the regionservers are loaded
- most are
>>>>>>> less than 20% CPU. Is this all network latency?
>>>>>>>
>>>>>>> -chris
>>>>>>>
>>>>>>> On Apr 29, 2010, at 8:29 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>>
>>>>>>>> Every insert on an indexed would require at the very least
an RPC to a
>>>>>>>> different regionserver.  If the regionservers are busy,
your request
>>>>>>>> could wait in the queue for a moment.
>>>>>>>>
>>>>>>>> One param to tune would be the handler thread count.  Set
it to 100 at
>>>>>>>> least.
>>>>>>>>
>>>>>>>> On Thu, Apr 29, 2010 at 2:16 AM, Chris Tarnas <cft@email.com>
wrote:
>>>>>>>>>
>>>>>>>>> I just finished some testing with JDK 1.6 u17 - so far
no performance
>>>>>>>>> improvements with just changing that. Disabling LZO compression
did gain a
>>>>>>>>> little bit (up to about 30/sec from 25/sec per thread).
Turning of indexes
>>>>>>>>> helped the most - that brought me up to 115/sec @ 2875
total rows a second.
>>>>>>>>> A single perl/thrift process can load at over 350 rows/sec
so its not
>>>>>>>>> scaling as well as I would have expected, even without
the indexes.
>>>>>>>>>
>>>>>>>>> Are the transactional indexes that costly? What is the
bottleneck there?
>>>>>>>>> CPU utilization and network packets went up when I disabled
the indexes, I
>>>>>>>>> don't think those are the bottlenecks for the indexes.
I was even able to
>>>>>>>>> add another 15 insert process (total of 40) and only
lost about 10% on a per
>>>>>>>>> process throughput. I probably could go even higher,
none of the nodes are
>>>>>>>>> above CPU 60% utilization and IO wait was at most 3.5%.
>>>>>>>>>
>>>>>>>>> Each rowkey is unique, so there should not be any blocking
on the row
>>>>>>>>> locks. I'll do more indexed tests tomorrow.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> -chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 29, 2010, at 12:18 AM, Todd Lipcon wrote:
>>>>>>>>>
>>>>>>>>>> Definitely smells like JDK 1.6.0_18. Downgrade that
back to 16 or 17 and
>>>>>>>>>> you
>>>>>>>>>> should be good to go. _18 is a botched release if
I ever saw one.
>>>>>>>>>>
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 28, 2010 at 10:54 PM, Chris Tarnas <cft@email.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Stack,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for looking. I checked the ganglia charts,
no server was at more
>>>>>>>>>>> than ~20% CPU utilization at any time during
the load test and swap was
>>>>>>>>>>> never used. Network traffic was light - just
running a count through
>>>>>>>>>>> hbase
>>>>>>>>>>> shell generates a much higher use. One the server
hosting meta
>>>>>>>>>>> specifically,
>>>>>>>>>>> it was at about 15-20% CPU, and IO wait never
went above 3%, was
>>>>>>>>>>> usually
>>>>>>>>>>> down at near 0.
>>>>>>>>>>>
>>>>>>>>>>> The load also died with a thrift timeout on every
single node (each
>>>>>>>>>>> node
>>>>>>>>>>> connecting to localhost for its thrift server),
it looks like a
>>>>>>>>>>> datanode
>>>>>>>>>>> just died and caused every thrift connection
to timeout - I'll have to
>>>>>>>>>>> up
>>>>>>>>>>> that limit to handle a node death.
>>>>>>>>>>>
>>>>>>>>>>> Checking logs this appears in the logs of the
region server hosting
>>>>>>>>>>> meta,
>>>>>>>>>>> looks like the dead datanode causing this error:
>>>>>>>>>>>
>>>>>>>>>>> 2010-04-29 01:01:38,948 WARN org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>>> DFSOutputStream ResponseProcessor exception  for
block
>>>>>>>>>>> blk_508630839844593817_11180java.io.IOException:
Bad response 1 for
>>>>>>>>>>> block
>>>>>>>>>>> blk_508630839844593817_11180 from datanode 10.195.150.255:50010
>>>>>>>>>>>      at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2423)
>>>>>>>>>>>
>>>>>>>>>>> The regionserver log on teh dead node, 10.195.150.255
has some more
>>>>>>>>>>> errors
>>>>>>>>>>> in it:
>>>>>>>>>>>
>>>>>>>>>>> http://pastebin.com/EFH9jz0w
>>>>>>>>>>>
>>>>>>>>>>> I found this in the .out file on the datanode:
>>>>>>>>>>>
>>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
(16.0-b13 mixed mode
>>>>>>>>>>> linux-amd64 )
>>>>>>>>>>> # Problematic frame:
>>>>>>>>>>> # V  [libjvm.so+0x62263c]
>>>>>>>>>>> #
>>>>>>>>>>> # An error report file with more information
is saved as:
>>>>>>>>>>> # /usr/local/hadoop-0.20.1/hs_err_pid1364.log
>>>>>>>>>>> #
>>>>>>>>>>> # If you would like to submit a bug report, please
visit:
>>>>>>>>>>> #   http://java.sun.com/webapps/bugreport/crash.jsp
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There is not a single error in the datanode's
log though. Also of note
>>>>>>>>>>> -
>>>>>>>>>>> this happened well into the test, so the node
dying cause the load to
>>>>>>>>>>> abort
>>>>>>>>>>> but not the prior poor performance. Looking through
the mailing list it
>>>>>>>>>>> looks like java 1.6.0_18 has a bad rep so I'll
update the AMI (although
>>>>>>>>>>> I'm
>>>>>>>>>>> using the same JVM on other servers in the office
w/o issue and decent
>>>>>>>>>>> single node performance and never dying...).
>>>>>>>>>>>
>>>>>>>>>>> Thanks for any help!
>>>>>>>>>>> -chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Apr 28, 2010, at 10:10 PM, Stack wrote:
>>>>>>>>>>>
>>>>>>>>>>>> What is load on the server hosting meta like?
 Higher than others?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 28, 2010, at 8:42 PM, Chris Tarnas
<cft@email.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi JG,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Speed is now down to 18 rows/sec/table
per process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a regionserver log that is serving
two of the regions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/Hx5se0hz
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the GC Log from the same server:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/ChrRvxCx
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the master log:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pastebin.com/L1Kn66qU
>>>>>>>>>>>>>
>>>>>>>>>>>>> The thrift server logs have nothing in
them in the same time period.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance!
>>>>>>>>>>>>>
>>>>>>>>>>>>> -chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 28, 2010, at 7:32 PM, Jonathan
Gray wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Chris,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's a really significant slowdown.
 I can't think of anything
>>>>>>>>>>>
>>>>>>>>>>> obvious that would cause that in your setup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any chance of some regionserver and
master logs from the time it was
>>>>>>>>>>>
>>>>>>>>>>> going slow?  Is there any activity in the logs
of the regionservers
>>>>>>>>>>> hosting
>>>>>>>>>>> the regions of the table being written to?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> JG
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Christopher Tarnas [mailto:cft@tarnas.org]
On Behalf Of Chris
>>>>>>>>>>>>>>> Tarnas
>>>>>>>>>>>>>>> Sent: Wednesday, April 28, 2010
6:27 PM
>>>>>>>>>>>>>>> To: hbase-user@hadoop.apache.org
>>>>>>>>>>>>>>> Subject: EC2 + Thrift inserts
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> First, thanks to all the HBase
developers for producing this, it's
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> great project and I'm glad to
be able to use it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm looking for some help and
hints here with insert performance
>>>>>>>>>>>>>>> help.
>>>>>>>>>>>>>>> I'm doing some benchmarking,
testing how I can scale up using
>>>>>>>>>>>>>>> HBase,
>>>>>>>>>>>>>>> not really looking at raw speed.
The testing is happening on EC2,
>>>>>>>>>>>
>>>>>>>>>>> using
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrew's scripts (thanks - those
were very helpful) to set them up
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> with a slightly customized version
of the default AMIs (added my
>>>>>>>>>>>>>>> application modules). I'm using
HBase 20.3 and Hadoop 20.1. I've
>>>>>>>>>>>
>>>>>>>>>>> looked
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> at the tips in the Wiki and it
looks like Andrew's scripts are
>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>> setup that way.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm inserting into HBase from
a hadoop streaming job that runs perl
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> uses the thrift gateway. I'm
also using the Transactional tables so
>>>>>>>>>>>>>>> that alone could be the case,
but from what I can tell I don't
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> so. LZO compression is also enabled
for the column families (much
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> the data is highly compressible).
My cluster has 7 nodes, 5
>>>>>>>>>>>>>>> regionservers, 1 master and 1
zookeeper. The regionservers and
>>>>>>>>>>>>>>> master
>>>>>>>>>>>>>>> are c1.xlarges. Each regionserver
has the tasktrackers that runs
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> hadoop streaming jobs, and regionserver
also runs its own thrift
>>>>>>>>>>>>>>> server. Each mapper that does
the load talks to the localhost's
>>>>>>>>>>>>>>> thrift
>>>>>>>>>>>>>>> server.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Row keys a fixed string +
an incremental number then the order
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> the bytes are reversed, so runA123
becomes 321Anur. I though of
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>> murmur hash but was worried about
collisions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I add more insert jobs, each
jobs throughput goes down. Way
>>>>>>>>>>>>>>> down. I
>>>>>>>>>>>>>>> went from about 200 row/sec/table
per job with one job to about 24
>>>>>>>>>>>>>>> rows/sec/table per job with 25
running jobs. The servers are mostly
>>>>>>>>>>>>>>> idle. I'm loading into two tables,
one has several indexes and I'm
>>>>>>>>>>>>>>> loading into three column families,
the other has no indexes and
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> column family. Both tables only
currently have two region each.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The regionserver that serves
the indexed table's regions is using
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> most CPU but is 87% idle. The
other servers are all at ~90% idle.
>>>>>>>>>>>
>>>>>>>>>>> There
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is no IO wait. the perl processes
are barely ticking over. Java on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> most "loaded" server is using
about 50-60% of one CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Normally when I do load in a
pseudo-distrbuted hbase (my
>>>>>>>>>>>>>>> development
>>>>>>>>>>>>>>> platform) perl's speed is the
limiting factor and uses about 85% of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> CPU. In this cluster they are
using only 5-10% of a CPU as they are
>>>>>>>>>>>
>>>>>>>>>>> all
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> waiting on thrift (hbase). When
I run only 1 process on the
>>>>>>>>>>>>>>> cluster,
>>>>>>>>>>>>>>> perl uses much more of a CPU,
maybe 70%.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any tips or help in getting the
speed/scalability up would be
>>>>>>>>>>>>>>> great.
>>>>>>>>>>>>>>> Please let me know if you need
any other info.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I send this - it looks like
the main table has split again and
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> being served by three regionservers..
My performance is going up a
>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>> (now 35 rows/sec/table per processes),
but still seems like I'm not
>>>>>>>>>>>>>>> using the full potential of even
the limited EC2 system, no IO wait
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> lots of idle CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> many thanks
>>>>>>>>>>>>>>> -chris
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Mime
View raw message