accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kepner, Jeremy - LLSC - MITLL" <kep...@ll.mit.edu>
Subject Re: Accumulo performance on various hardware configurations
Date Wed, 29 Aug 2018 16:40:04 GMT
50K/sec on any SQL database on a single node would be very good.

On Aug 29, 2018, at 12:37 PM, Jonathan Yom-Tov <jon.yomtov@metismart.com<mailto:jon.yomtov@metismart.com>>
wrote:

I'm not 100% sure it's slow. Coming from RDBMS it seems it might be, but I wanted the opinion
of others since I'm not experienced with Accumulo. From your reply I assume you think it's
reasonable?

On Wed, Aug 29, 2018 at 6:33 PM, Jeremy Kepner <kepner@ll.mit.edu<mailto:kepner@ll.mit.edu>>
wrote:
Why do you think 500K/sec is slow?

On Wed, Aug 29, 2018 at 04:39:32PM +0300, guy sharon wrote:
> Well, in one experiment I used a machine with 48 cores and 192GB and the
> results actually came out worse. And in another I had 7 tservers on servers
> with 4 cores. I think I'm not configuring things correctly because I'd
> expect the improved hardware to improve performance and that doesn't seem
> to be the case.
>
> On Wed, Aug 29, 2018 at 4:00 PM Jeremy Kepner <kepner@ll.mit.edu<mailto:kepner@ll.mit.edu>>
wrote:
>
> > Your node is fairly underpowered (2 cores and 8 GB RAM) and is less than
> > most laptops.  That said
> >
> > 6M / 12sec = 500K/sec
> >
> > is good for a single node Accumulo instance on this hardware.
> >
> > Spitting might not help since you only have 2 cores so added parallism
> > can't
> > be exploited.
> >
> > Why do you think 500K/sec is slow?
> >
> > To determine slowness one would have to compare with other database
> > technology on the same platform.
> >
> >
> > On Wed, Aug 29, 2018 at 03:04:51PM +0300, guy sharon wrote:
> > > hi,
> > >
> > > Continuing my performance benchmarks, I'm still trying to figure out if
> > the
> > > results I'm getting are reasonable and why throwing more hardware at the
> > > problem doesn't help. What I'm doing is a full table scan on a table with
> > > 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop
> > 2.8.4.
> > > The table is populated by
> > > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
> > > modified to write 6M entries instead of 50k. Reads are performed by
> > > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
> > > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
> > > results I got:
> > >
> > > 1. 5 tserver cluster as configured by Muchos (
> > > https://github.com/apache/fluo-muchos), running on m5d.large AWS
> > machines
> > > (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan
> > > took 12 seconds.
> > > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> > > 3. Splitting the table to 4 tablets causes the runtime to increase to 16
> > > seconds.
> > > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> > > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
> > > Amazon Linux. Configuration as provided by Uno (
> > > https://github.com/apache/fluo-uno). Total time was 26 seconds.
> > >
> > > Offhand I would say this is very slow. I'm guessing I'm making some sort
> > of
> > > newbie (possibly configuration) mistake but I can't figure out what it
> > is.
> > > Can anyone point me to something that might help me find out what it is?
> > >
> > > thanks,
> > > Guy.
> >



Mime
View raw message