hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishanth S <nishanth.2...@gmail.com>
Subject Re: Tool to to execute an benchmark for HBase.
Date Fri, 30 Jan 2015 17:31:25 GMT
You are hitting  hbase harder now which is important for benchmarking.If
there is no data loss it means your hbase cluster is  good enough to handle
the load.You are simply making more use of the cores from where you launch
 ycsb  process.Write your own workload depending on the record sizes,format
to see what can be achieved in a particular use case.

-Nishanth

On Fri, Jan 30, 2015 at 5:34 AM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:

> I have coming back to the benchmark.I executde this command:
> yscb run hbase -P workflowA -p columnfamilty=cf -p
> operationcount=100000 threads=32
>
> And I got an performace of 2000op/seg
> What I did later it's to execute ten of those commands in parallel and
> I got about 18000op/sec  in total. I don't get 2000op/sec for each ot
> them executions but I got about 1800op/sec
>
> I don't know if ti's an HBase question, but, I don't understand why I
> got more performance if I execute more commands in parallel if I
> already execute 32 threads.
> I took a look to the "top" and I saw that in the first (just one
> process) the CPU was working about 20-60% when I launch more processes
> the CPU it's about 400-500%.
>
>
>
> 2015-01-29 18:23 GMT+01:00 Guillermo Ortiz <konstt2000@gmail.com>:
> > There's an option when you execute yscb to say how many clients
> > threads you want to use. I tried with 1/8/16/32. Those results are
> > with 16, the improvement 1vs8 it's pretty high not as much 16 to 32.
> > I only use one yscb, could it be that important?
> >
> > -threads : the number of client threads. By default, the YCSB Client
> > uses a single worker thread, but additional threads can be specified.
> > This is often done to increase the amount of load offered against the
> > database.
> >
> > 2015-01-29 17:27 GMT+01:00 Nishanth S <nishanth.2884@gmail.com>:
> >> How many instances of ycsb do you run and how many threads do you use
> per
> >> instance.I guess these ops are per instance and  you should get similar
> >> numbers if you run  more instances.In short try running more  workload
> >> instances...
> >>
> >> -Nishanth
> >>
> >> On Thu, Jan 29, 2015 at 8:49 AM, Guillermo Ortiz <konstt2000@gmail.com>
> >> wrote:
> >>
> >>> Yes, I'm using 40%. i can't access to those data either.
> >>> I don't know how YSCB executes the reads and if they are random and
> >>> could take advange of the cache.
> >>>
> >>> Do you think that it's an acceptable performance?
> >>>
> >>>
> >>> 2015-01-29 16:26 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
> >>> > What's the value for hfile.block.cache.size ?
> >>> >
> >>> > By default it is 40%. You may want to increase its value if you're
> using
> >>> > default.
> >>> >
> >>> > Andrew published some ycsb results :
> >>> > http://people.apache.org/~apurtell/results-ycsb-0.98.8/ycsb
> >>> > -0.98.0-vs-0.98.8.pdf
> >>> >
> >>> > However, I couldn't access the above now.
> >>> >
> >>> > Cheers
> >>> >
> >>> > On Thu, Jan 29, 2015 at 7:14 AM, Guillermo Ortiz <
> konstt2000@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Is there any result with that benchmark to compare??
> >>> >> I'm executing the different workloads and for example for 100%
Reads
> >>> >> in a table with 10Millions of records I only get an performance
of
> >>> >> 2000operations/sec. I hoped much better performance but I could
be
> >>> >> wrong. I'd like to know if it's a normal performance or I could
have
> >>> >> something bad configured.
> >>> >>
> >>> >>
> >>> >> I have splitted the tabled and all the records are balanced and
used
> >>> >> snappy.
> >>> >> The cluster has a master and 4 regions servers with 256Gb,Cores
2
> (32
> >>> >> w/ Hyperthreading), 0.98.6-cdh5.3.0,
> >>> >>
> >>> >> RegionServer is executed with these parameters:
> >>> >>  /usr/java/jdk1.7.0_67-cloudera/bin/java -Dproc_regionserver
> >>> >> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
> >>> >> -Djava.net.preferIPv4Stack=true -Xms640679936 -Xmx640679936
> >>> >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> >>> >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> >>> >> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> >>> >> -Dhbase.log.dir=/var/log/hbase
> >>> >>
> >>> >>
> >>>
> -Dhbase.log.file=hbase-cmf-hbase-REGIONSERVER-cnsalbsrvcl23.lvtc.gsnet.corp.log.out
> >>> >>
> >>>
> -Dhbase.home.dir=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase
> >>> >> -Dhbase.id.str= -Dhbase.root.logger=INFO,RFA
> >>> >>
> >>> >>
> >>>
> -Djava.library.path=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
> >>> >> -Dhbase.security.logger=INFO,RFAS
> >>> >> org.apache.hadoop.hbase.regionserver.HRegionServer start
> >>> >>
> >>> >>
> >>> >> The results for 100% reads are
> >>> >> [OVERALL], RunTime(ms), 42734.0
> >>> >> [OVERALL], Throughput(ops/sec), 2340.0570973931763
> >>> >> [UPDATE], Operations, 1.0
> >>> >> [UPDATE], AverageLatency(us), 103170.0
> >>> >> [UPDATE], MinLatency(us), 103168.0
> >>> >> [UPDATE], MaxLatency(us), 103171.0
> >>> >> [UPDATE], 95thPercentileLatency(ms), 103.0
> >>> >> [UPDATE], 99thPercentileLatency(ms), 103.0
> >>> >> [READ], Operations, 100000.0
> >>> >> [READ], AverageLatency(us), 412.5534
> >>> >> [READ], AverageLatency(us,corrected), 581.6249026771276
> >>> >> [READ], MinLatency(us), 218.0
> >>> >> [READ], MaxLatency(us), 268383.0
> >>> >> [READ], MaxLatency(us,corrected), 268383.0
> >>> >> [READ], 95thPercentileLatency(ms), 0.0
> >>> >> [READ], 95thPercentileLatency(ms,corrected), 0.0
> >>> >> [READ], 99thPercentileLatency(ms), 0.0
> >>> >> [READ], 99thPercentileLatency(ms,corrected), 0.0
> >>> >> [READ], Return=0, 100000
> >>> >> [CLEANUP], Operations, 1.0
> >>> >> [CLEANUP], AverageLatency(us), 103598.0
> >>> >> [CLEANUP], MinLatency(us), 103596.0
> >>> >> [CLEANUP], MaxLatency(us), 103599.0
> >>> >> [CLEANUP], 95thPercentileLatency(ms), 103.0
> >>> >> [CLEANUP], 99thPercentileLatency(ms), 103.0
> >>> >>
> >>> >> hbase(main):030:0> describe 'username'
> >>> >> DESCRIPTION
> >>> >>                                     ENABLED
> >>> >>  'username', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE',
> BLOOMFILTER
> >>> >> => 'ROW', REPLICATION_SCOPE => '0', true
> >>> >>   VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS
=> '0', TTL
> >>> >> => 'FOREVER', KEEP_DELETED_CELLS => '
> >>> >>  false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=>
> >>> 'true'}
> >>> >> 1 row(s) in 0.0170 seconds
> >>> >>
> >>> >> 2015-01-29 5:27 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
> >>> >> > Maybe ask on Cassandra mailing list for the benchmark tool
they
> use ?
> >>> >> >
> >>> >> > Cheers
> >>> >> >
> >>> >> > On Wed, Jan 28, 2015 at 1:23 PM, Guillermo Ortiz <
> >>> konstt2000@gmail.com>
> >>> >> > wrote:
> >>> >> >
> >>> >> >> I was checking that web, do you know if there's another
> possibility
> >>> >> >> since last updated for Cassandra was two years ago and
I'd like
> to
> >>> >> >> compare bothof them with kind of same tool/code.
> >>> >> >>
> >>> >> >> 2015-01-28 22:10 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
> >>> >> >> > Guillermo:
> >>> >> >> > If you use hbase 0.98.x, please consider Andrew's
ycsb repo:
> >>> >> >> >
> >>> >> >> > https://github.com/apurtell/ycsb/tree/new_hbase_client
> >>> >> >> >
> >>> >> >> > Cheers
> >>> >> >> >
> >>> >> >> > On Wed, Jan 28, 2015 at 12:41 PM, Nishanth S <
> >>> nishanth.2884@gmail.com
> >>> >> >
> >>> >> >> > wrote:
> >>> >> >> >
> >>> >> >> >> You can use ycsb for this purpose.See here
> >>> >> >> >>
> >>> >> >> >> https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started
> >>> >> >> >> -Nishanth
> >>> >> >> >>
> >>> >> >> >> On Wed, Jan 28, 2015 at 1:37 PM, Guillermo Ortiz
<
> >>> >> konstt2000@gmail.com>
> >>> >> >> >> wrote:
> >>> >> >> >>
> >>> >> >> >> > Hi,
> >>> >> >> >> >
> >>> >> >> >> > I'd like to do some benchmarks fo HBase
but I don't know
> what
> >>> tool
> >>> >> >> >> > could use. I started to make some code but
I guess that
> there're
> >>> >> some
> >>> >> >> >> > easier.
> >>> >> >> >> >
> >>> >> >> >> > I've taken a look to JMeter, but I guess
that I'd attack
> >>> directly
> >>> >> from
> >>> >> >> >> > Java, JMeter looks great but I don't know
if it fits well in
> >>> this
> >>> >> >> >> > scenario. What tool could I use to take
some measures as
> time to
> >>> >> >> >> > response some read and write request, etc.
I'd like that to
> be
> >>> >> able to
> >>> >> >> >> > make the same benchmarks to Cassandra.
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message