accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Kepner <kep...@ll.mit.edu>
Subject Re: benchmarking
Date Tue, 28 Aug 2018 12:45:00 GMT
FYI, Single node Accumulo instances is our most popular deployment.
We have hundreds of them.   Accummulo is so fast that it can replace
what would normally require 20 MySQL servers.

Regards.  -Jeremy

On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:
> Hi Guy,
> 
> Apache Accumulo is designed for horizontally scaling out for large scale workloads that
need to do random reads and writes. There's a non-trivial amount of overhead that comes with
a system aimed at doing that on thousands of nodes.
> 
> If your use case works for a single laptop with such a small number of entries and exhaustive
scans, then Accumulo is probably not the correct tool for the job.
> 
> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you can just
rely on a file format like Apache Avro:
> 
> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 6300000 --schema
'{ "type": "record", "name": "entry", "fields": [ { "name": "field0", "type": "string" } ]
}' ~/Downloads/6.3m_entries.avro
> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable
> test.seed=1535441473243
> 
> real	0m5.451s
> user	0m5.922s
> sys	0m0.656s
> busbey$ ls -lah ~/Downloads/6.3m_entries.avro 
> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 /Users/busbey/Downloads/6.3m_entries.avro
> busbey$ time java -jar avro-tools-1.7.7.jar tojson ~/Downloads/6.3m_entries.avro | wc
-l
>  6300000
> 
> real	0m4.239s
> user	0m6.026s
> sys	0m0.721s
> 
> I'd recommend that you start at >= 5 nodes if you want to look at rough per-node throughput
capabilities.
> 
> 
> On 2018/08/28 06:59:38, guy sharon <guy.sharon.1977@gmail.com> wrote: 
> > hi Mike,
> > 
> > Thanks for the links.
> > 
> > My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> > on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> > 16GB of RAM. As an example I'm running a count of all entries for a table
> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> > reasonable or not. Seems a little slow to me. What do you think?
> > 
> > BR,
> > Guy.
> > 
> > 
> > 
> > 
> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <mjwall@apache.org> wrote:
> > 
> > > Hi Guy,
> > >
> > > Here are a couple links I found.  Can you tell us more about your setup
> > > and what you are seeing?
> > >
> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > >
> > > Mike
> > >
> > >
> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <guy.sharon.1977@gmail.com>
> > > wrote:
> > >
> > >> hi,
> > >>
> > >> I've just started working with Accumulo and I think I'm experiencing slow
> > >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> > >> of any standard benchmarks and benchmarking tools I can use to tell if
the
> > >> performance I'm getting is reasonable?
> > >>
> > >>
> > >>
> > 

Mime
View raw message