Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 71CCF200BCC for ; Tue, 29 Nov 2016 09:41:13 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 70AD5160B15; Tue, 29 Nov 2016 08:41:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 480AF160B05 for ; Tue, 29 Nov 2016 09:41:12 +0100 (CET) Received: (qmail 53499 invoked by uid 500); 29 Nov 2016 08:41:11 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 53481 invoked by uid 99); 29 Nov 2016 08:41:10 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 08:41:10 +0000 Received: from mail-qk0-f182.google.com (mail-qk0-f182.google.com [209.85.220.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 7DA5A1A0310 for ; Tue, 29 Nov 2016 08:41:10 +0000 (UTC) Received: by mail-qk0-f182.google.com with SMTP id q130so167351796qke.1 for ; Tue, 29 Nov 2016 00:41:10 -0800 (PST) X-Gm-Message-State: AKaTC01vFjgjsGlletucUowiabye2TQg7gixsTKSyC2tXhg/FeBwHhNmhlHoumDKqwTn+LfIqmAvTiekQKEaUpya X-Received: by 10.55.220.69 with SMTP id v66mr22667967qki.264.1480408869594; Tue, 29 Nov 2016 00:41:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.148.37 with HTTP; Tue, 29 Nov 2016 00:41:09 -0800 (PST) In-Reply-To: References: <1480350707640.4788@cegeka.com> From: Yakov Zhdanov Date: Tue, 29 Nov 2016 15:41:09 +0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Performance question To: user@ignite.apache.org Content-Type: multipart/alternative; boundary=94eb2c0cc9accb84b805426c8cdd archived-at: Tue, 29 Nov 2016 08:41:13 -0000 --94eb2c0cc9accb84b805426c8cdd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think it is a bad idea to set number of threads greater than CPU count given you do not block on any IO operations However, benchmark itself seems totally incorrect to me. The most important issues are 1. total time is too low. try to put it to a loop body and measure each iterations. you can also (moreover, I would like you to) use jmh for local operations benchmarks 2. you don't have any warm up phase. Please refer to http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-be= nchmark-in-java or any other resource explaining on how to create benchmarks. --Yakov 2016-11-29 15:14 GMT+07:00 Vladislav Pyatkov : > Hi Alisher, > > It look doubt for me. You parallelize the job, but got a performance > decrease. > I recommend to use a java profiler and try to separate long time methods. > > How are you get a list of local partition (can it contain excess numbers)= ? > And please check, has forckjoin pool enough size: > > -Djava.util.concurrent.ForkJoinPool.common.parallelism=3D1024 > > > On Nov 28, 2016 8:39 PM, "Alisher Alimov" wrote= : > >> I found only one way to parallelize read via ScanQuery >> >> int[] partitions =3D this.ignite.affinity("test.cache").primaryPartition= s(this.ignite.cluster().node()); >> >> startTime =3D System.currentTimeMillis(); >> >> Arrays.stream(partitions) >> .parallel() >> .forEach(partition -> { >> ScanQuery qry =3D new ScanQuery<>(partition); >> qry.setLocal(true); >> qry.setPageSize(5_000); >> >> QueryCursor> query =3D cache.query= (qry); >> List> all =3D query.getAll(); >> }); >> >> System.out.println(String.format("Complete in: %dms", System.currentTime= Millis() - startTime)); >> >> >> But it=E2=80=99s doesn=E2=80=99t help a lot (speed was downgrade on 10-2= 0%) or there is >> another good solution to do it? >> >> >> >> With best regards >> Alisher Alimov >> alimovalisher@gmail.com >> >> >> >> >> On 28 Nov 2016, at 19:38, Alexey Goncharuk >> wrote: >> >> Hi Alisher, >> >> As Nicolae suggested, try parallelizing your scan using per-partition >> iterator. This should give you almost linear performance growth up to th= e >> number of available CPUs. >> Also make sure to set CacheConfiguration#copyOnRead flag to false. >> >> --AG >> >> 2016-11-28 19:31 GMT+03:00 Marasoiu Nicolae >> : >> >>> =E2=80=8BRegarding CPU load, a single thread of execution exists in the= program >>> so (at most) one core is used. So if you have 8 cores, it means that it= is >>> 8 to 16 times slower than a program able to use all the cores & CPU >>> redundancy of the machine. >>> >>> In my tests, indeed, a core looks fully utilized. To me, scanning 1M >>> key-values per second is pretty ok, but indeed, if LMAX got 6M transact= ions >>> per core per second, it can perhaps go up, but something tells me this = will >>> not be the limitation of the typical application. >>> >>> >>> Met vriendelijke groeten / Meilleures salutations / Best regards >>> >>> *Nicolae Marasoiu* >>> *Agile Developer* >>> >>> *E* *Nicolae.Marasoiu@cegeka.com * >>> >>> CEGEKA 15-17 Ion Mihalache Blvd. Tower Center Building, >>> 4th,5th,6th,8th,9th fl >>> RO-011171 Bucharest (RO), Romania >>> *T* +40 21 336 20 65 >>> *WWW.CEGEKA.COM * [image: LinkedIn] >>> >>> ------------------------------ >>> *De la:* Alisher Alimov >>> *Trimis:* 28 noiembrie 2016 15:27 >>> *C=C4=83tre:* user@ignite.apache.org >>> *Subiect:* Performance question >>> >>> Hello! >>> >>> I have write and run a simple performance test to check >>> IgniteCache#localEntries and found that current method is not enough fa= st. >>> >>> Ignite ignite =3D Ignition.start(); >>> >>> >>> CacheConfiguration cacheConfiguration =3D new CacheConfigur= ation<>(); >>> cacheConfiguration.setBackups(0); >>> >>> IgniteCache cache =3D ignite.getOrCreateCache("test.cache")= ; >>> >>> for (int i =3D 0; i < 1_000_000; i++) { >>> cache.put(UUID.randomUUID(), UUID.randomUUID()); >>> } >>> >>> long startTime =3D System.currentTimeMillis(); >>> >>> cache.localEntries(CachePeekMode.PRIMARY).forEach(entry -> { >>> }); >>> >>> System.out.println(String.format("Complete in: %dms", System.currentTim= eMillis() - startTime)); >>> >>> >>> Reading local entries take about 1s (1000 rows per ms) that=E2=80=99s i= s low. >>> Test was run on server with provided configuration with default Ignite >>> configs, load average was about 0 and CPU was not busy more than 10% >>> Intel(R) Xeon(R) CPU E5645 @ 2.40GHz >>> >>> >>> May be I do or configure something wrong or current speed is normal? >>> >>> >>> With best regards >>> Alisher Alimov >>> alimovalisher@gmail.com >>> >>> >>> >>> >>> >> >> --94eb2c0cc9accb84b805426c8cdd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think it is a bad idea to set number of threads greater = than CPU count given you do not block on any IO operations

However, benchmark itself seems totally incorrect to me. The most import= ant issues are
1. total time is too low. try to put it to a loop = body and measure each iterations. you can also (moreover, I would like you = to) use jmh for local operations benchmarks
2. you don't have= any warm up phase.

Please refer to=C2=A0http://stackoverflow.com/questions/504103/how-do-i-wri= te-a-correct-micro-benchmark-in-java or any other resource explaining o= n how to create benchmarks.

--Yakov

2016-11-29 15:14 GMT+07:00 Vladislav Pyatkov= <vldpyatkov@gmail.com>:
Hi Alisher,

It look doubt for m= e. You parallelize the job, but got a performance decrease.
I recommend = to use a java profiler and try to separate long time methods.

How ar= e you get a list of local partition (can it contain excess numbers)?
And please check, has forckjoin pool enough size:

-Djava.util.concurrent.ForkJoinPool.common.parallelism=3D1024<= br>
=C2=A0

On Nov 28, 2016 8= :39 PM, "Alisher Alimov" <alimovalisher@gmail.com> wrote:
I found only one way to parallelize read via Scan= Query

int[] partitions =3D this.ignite.affin=
ity("test.cache").primaryPartitions(this.ignite.cluster().node());

startTime =3D System.= currentTimeMillis();

Arrays.stream(partitions)
.parallel()
.fo= rEach(partition -> {
ScanQuery<Object, Object> qry =3D new ScanQuery<>(partition);
qry.setLocal(true);
qry.setPageSize(5_000);

QueryCursor<Cache.Entry<Object, Object>> query =3D cache.query(qry);
L= ist<Cache.Entry<Object, = Object>> all =3D query.getAll()= ;
});

System.out.println(String.format<= /span>("Complete in: %dms"<= /span>, System.currentTimeMillis() - startTime));

But it=E2=80=99s do= esn=E2=80=99t help a lot (speed was downgrade on 10-20%) or there is anothe= r good solution to do it?



With best regards
Alisher Alimov




On 28 Nov 2016, at 19:38, Alexey Go= ncharuk <alexey.goncharuk@gmail.com> wrote:

Hi Alisher,

As Nicolae suggested, try paralleli= zing your scan using per-partition iterator. This should give you almost li= near performance growth up to the number of available CPUs.
Also = make sure to set CacheConfiguration#copyOnRead flag to false.=C2=A0

--AG

2016-11-28 19:31 GMT+03:00 Marasoiu Nicolae <Ni= colae.Marasoiu@cegeka.com>:

=E2=80=8BRegarding CPU l= oad, a single thread of execution exists in the program so (at most) one co= re is used. So if you have 8 cores, it means that it is 8 to 16 times slowe= r than a program able to use all the cores & CPU redundancy=C2=A0of the= machine.

In my tests, indeed, a core looks fully utilized. To me, scanning 1M= key-values per second is pretty ok, but indeed, if LMAX got 6M transaction= s per core per second, it can perhaps go up, but something tells me this wi= ll not be the limitation of the typical application.


Met vriendelijke groeten / Meilleures salutations / Best regards=

Nicolae Marasoiu
Agile Developer
=C2=A0
E=C2=A0 Nicolae.Marasoiu@cegeka.c= om=C2=A0
=C2=A0
CEGEKA 15-17 Ion Mihalache Blvd. Tower Center Building, 4th,5th,6th,8th,9th= fl
RO-011171 Bucharest (RO), Romania
T +40 21 336 20 65
WWW.CEGEKA.COM=C2=A0
3D"LinkedIn"

De la: Alisher Alimov <alimovalisher@gmail.com>
Trimis: 28 noiembrie 2016 15:27
C=C4=83tre: user@ignite.apache.org
Subiect: Performance question
=C2=A0
Ignite ignite =3D Ignition.start();

CacheConfiguration<UUID, UUID> cach= eConfiguration =3D new CacheConfiguration<>()= ;
=
cacheConfiguration.setBackups(0);

Igni= teCache<UUID, UUID> cache =3D ignite.getOrCreateCache("test.cache");

for (= int <= /span>= i =3D 0; i < 1_000_000; i++) {
cache.pu= t(UUID.
randomUUID(), UUID.<= /font>randomUUID());
}

long startTime =3D System.currentTimeMillis();

cache.localEntries(CachePeekMode.PRIMARY).forEach(entry -> {
});
System.ou=
t.println(String.format("Complete in: %dms", System.current=
TimeMillis() - startTime));

Reading local entries take about 1s (1000 rows per ms) that=E2=80=99s = is low.=C2=A0
Test was run on server with provided configuration with default Ignite= configs, load average was about 0 and CPU was not busy more than 10%
Intel(R) Xeon(R) CPU =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 E5645 =C2=A0@ = 2.40GHz


May be I do =C2=A0or configure something wrong or current speed is nor= mal?


With best regards
Alisher Alimov







--94eb2c0cc9accb84b805426c8cdd--