Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 90449200C73 for ; Wed, 10 May 2017 19:03:02 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8EC88160B9C; Wed, 10 May 2017 17:03:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 39269160B99 for ; Wed, 10 May 2017 19:03:01 +0200 (CEST) Received: (qmail 25065 invoked by uid 500); 10 May 2017 17:03:00 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 25053 invoked by uid 99); 10 May 2017 17:03:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 May 2017 17:03:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CFA01C031B for ; Wed, 10 May 2017 17:02:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.174 X-Spam-Level: **** X-Spam-Status: No, score=4.174 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, HTML_MESSAGE=2, NML_ADSP_CUSTOM_MED=1.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 96V4lOxW2VzO for ; Wed, 10 May 2017 17:02:57 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 112DD5F36B for ; Wed, 10 May 2017 17:02:57 +0000 (UTC) Received: from static.162.255.23.37.macminivault.com (unknown [162.255.23.37]) by mwork.nabble.com (Postfix) with ESMTP id BDE29408DAF82 for ; Wed, 10 May 2017 10:02:56 -0700 (MST) Date: Wed, 10 May 2017 10:02:56 -0700 (MST) From: afedotov To: user@ignite.apache.org Message-ID: In-Reply-To: References: Subject: Re: GroupBy with index is really slow. MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_122191_2006772036.1494435776772" archived-at: Wed, 10 May 2017 17:03:02 -0000 ------=_Part_122191_2006772036.1494435776772 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi. I think the problem is that age and month have bad selectivity. For example, in the case of month, each month index will select in average 2000000 / 12 ~ 1666667 rows. In this case, it's probably faster scan all the rows in storage/disk order then using the index order. Kind regards, Alex. On Wed, May 10, 2017 at 6:29 PM, Guillermo Ortiz [via Apache Ignite Users] < ml+s70518n12599h3@n6.nabble.com> wrote: > Are indices used always that it'spossible or it depends how many records > data you have to read as Oracle? So, if You use and index but the result > it's to read 25% of the data, usually indices are not used because > sequencial read are faster. As this is in memory I understand that it > doesn't apply but I'm not sure and I didn't read anything about it in the > documentation. > > 2017-05-10 17:27 GMT+02:00 Guillermo Ortiz <[hidden email] > >: > >> If I execute many times, time is similar in both. >> >> 2017-05-10 17:25 GMT+02:00 Guillermo Ortiz <[hidden email] >> >: >> >>> I'm using 2.0.0. >>> Right now in my laptop with 2Mill objects. >>> >>> >>> *WITH INDICES* >>> ******Result SELECT distinct(MONTH) FROM PERSONWITHINDEX >>> September,February,March,August,June,April,July,January,May, >>> November,October, >>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms >>> [SELECT DISTINCT >>> __Z0.MONTH AS __C0_0 >>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0 >>> /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */] >>> [SELECT DISTINCT >>> __C0_0 AS MONTH >>> FROM PUBLIC.__T0 >>> /* "PersonWithIndex"."merge_scan" */] >>> >>> >>> *WITHOUT INDICES* >>> ******Result SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX >>> September,November,February,March,May,April,July,August,Janu >>> ary,June,October, >>> Elapsed time SELECT distinct(MONTH) FROM PERSONWITHOUTINDEX:1690ms >>> [SELECT DISTINCT >>> __Z0.MONTH AS __C0_0 >>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0 >>> /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */] >>> [SELECT DISTINCT >>> __C0_0 AS MONTH >>> FROM PUBLIC.__T0 >>> /* "PersonWithoutIndex"."merge_scan" */] >>> *************************** >>> >>> public class PersonWithIndex implements Serializable { >>> @QuerySqlField(index = true) >>> public int id; >>> @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=1)}) >>> public int age; >>> @QuerySqlField >>> public String dni; >>> @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age", order=0)}) >>> public String month; >>> >>> >>> 2017-05-10 17:06 GMT+02:00 afedotov <[hidden email] >>> >: >>> >>>> Hi, >>>> >>>> Which Ignite version do you use? >>>> How many Ignite nodes do you run? >>>> What is your cache configuration? >>>> >>>> As well, could you please try the following query and see the >>>> performance? >>>> SELECT DISTINCT age FROM PersonWithindex; >>>> >>>> >>>> Kind regards, >>>> Alex. >>>> >>>> On Wed, May 10, 2017 at 5:57 PM, Guillermo Ortiz [via Apache Ignite >>>> Users] <[hidden email] >>>> > wrote: >>>> >>>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects >>>>> (1,2GB) and it takes about 30sec. >>>>> I have tried to execute the query with the same collection with and >>>>> without indices with the same time results. >>>>> >>>>> This is a test before to run in the real cluster with 6 nodes of 512Gb >>>>> and 48cores each one. We tried to execute there with a real dataset(22M >>>>> rows, 7gb) but it takes so long that query is aborted. >>>>> >>>>> >>>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email] >>>>> >: >>>>> >>>>>> Hi, >>>>>> >>>>>> Why do you think query is slow? >>>>>> What its execution time? What is expected time? >>>>>> How many nodes do you use? How much data does cache contains? >>>>>> >>>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[hidden email] >>>>>> > wrote: >>>>>> >>>>>>> I have a simple cache where I have Person object (id, age and so >>>>>>> on..) >>>>>>> >>>>>>> I have created indeces by id and age to try some examples but >>>>>>> queries with "group by" go really slow. >>>>>>> >>>>>>> I'm trying this query: >>>>>>> SELECT age >>>>>>> FROM >>>>>>> PersonWithindex >>>>>>> group by age >>>>>>> >>>>>>> >>>>>>> SELECT >>>>>>> AGE AS __C0 >>>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX >>>>>>> /* "personCacheWithIndex"."age_idx" */ >>>>>>> GROUP BY AGE >>>>>>> /* group sorted */ >>>>>>> >>>>>>> SELECT >>>>>>> __C0 AS AGE >>>>>>> FROM PUBLIC.__T0 >>>>>>> /* "personCacheWithIndex"."merge_scan" */ >>>>>>> GROUP BY __C0 >>>>>>> >>>>>>> Although it seems that it uses index, why is it going so slow? I >>>>>>> think that it should be pretty fast with an index. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Andrey V. Mashenkov >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> If you reply to this email, your message will be added to the >>>>> discussion below: >>>>> http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with- >>>>> index-is-really-slow-tp12526p12592.html >>>>> To start a new topic under Apache Ignite Users, email [hidden email] >>>>> >>>>> To unsubscribe from Apache Ignite Users, click here. >>>>> NAML >>>>> >>>>> >>>> >>>> >>>> ------------------------------ >>>> View this message in context: Re: GroupBy with index is really slow. >>>> >>>> Sent from the Apache Ignite Users mailing list archive >>>> at Nabble.com. >>>> >>> >>> >> > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > http://apache-ignite-users.70518.x6.nabble.com/GroupBy- > with-index-is-really-slow-tp12526p12599.html > To start a new topic under Apache Ignite Users, email > ml+s70518n1h65@n6.nabble.com > To unsubscribe from Apache Ignite Users, click here > > . > NAML > > -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/GroupBy-with-index-is-really-slow-tp12526p12602.html Sent from the Apache Ignite Users mailing list archive at Nabble.com. ------=_Part_122191_2006772036.1494435776772 Content-Type: text/html; charset=UTF8 Content-Transfer-Encoding: quoted-printable
Hi.

I think the problem is that age and= month have bad selectivity. For example, in the case of month, each month = index will select in average 2000000 / 12 ~ 1666667 rows.
In this= case, it's probably faster scan all the rows in storage/disk order the= n using the index order.

Kind regards,
Alex.

On Wed, May 10, 2017 at 6:29 PM, Guillermo O= rtiz [via Apache Ignite Users] <[hidden email]> wrote:
=09
Are indices used always that it'spossible or it dep= ends how many records data you have to read as Oracle? So, if You use and i= ndex but the result it's to read 25% of the data, usually indices are n= ot used because sequencial read are faster. As this is in memory I understa= nd that it doesn't apply but I'm not sure and I didn't read any= thing about it in the documentation.=C2=A0

2017-05-10 17:27 GMT= +02:00 Guillermo Ortiz <[hidden email]>:
If I execute many times, time is similar in both.

2017-05-10 17:25 GMT+02:00 Guillermo Ortiz <[hidden e= mail]>:
I'= m using 2.0.0.=C2=A0
Right now in my laptop with 2Mill objects.


WITH INDICES
******R= esult SELECT distinct(MONTH) FROM PERSONWITHINDEX September,February,March,= August,June,April,July,January,May,November,October,
El= apsed time SELECT distinct(MONTH) FROM PERSONWITHINDEX:1990ms
[SE= LECT DISTINCT
=C2=A0 =C2=A0 __Z0.MONTH AS __C0_0
FROM &= quot;PersonWithIndex".PERSONWITHINDEX __Z0
=C2=A0 =C2= =A0 /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */]
=
[SELECT DISTINCT
=C2=A0 =C2=A0 __C0_0 AS MONTH
FRO= M PUBLIC.__T0
=C2=A0 =C2=A0 /* "PersonWithIndex"."= merge_scan" */]


WITHO= UT INDICES
******Result SELECT distinct(MONTH) FROM PERS= ONWITHOUTINDEX September,November,February,March,May,April,July,August= ,January,June,October,
Elapsed time SELECT distinct(MONTH) F= ROM PERSONWITHOUTINDEX:1690ms
[SELECT DISTINCT
=C2=A0 = =C2=A0 __Z0.MONTH AS __C0_0
FROM "PersonWithoutIndex".P= ERSONWITHOUTINDEX __Z0
=C2=A0 =C2=A0 /* "PersonWithoutI= ndex".PERSONWITHOUTINDEX.__SCAN_ */]
[SELECT DISTINCT
=C2=A0 =C2=A0 __C0_0 AS MONTH
FROM PUBLIC.__T0
=C2=A0 =C2=A0 /* "PersonWithoutIndex"."merge_scan"= ; */]
***************************

=
public class PersonWith=
Index implements Serializable {
@QuerySqlField<= /span>(index =3D true)
public in= t id;<= span style=3D"color:rgb(128,128,128);font-style:italic">
@QuerySqlField(index =3D true, orderedGroups =3D {@QuerySqlField.Group(name=3D"index_group_month_age"= ;, order=3D1)})
public int
age;
@QuerySqlField
public String dni; @QuerySqlField(index =3D true, orderedGroups = =3D {@QuerySqlField.Group(name= =3D"index_gro= up_month_age", order=3D0)})
public String month;

2017-05-10 17:06 GMT+02:00 afedotov <[hidden email= ]>:
Hi,

Which Ignite version do you use?How many Ignite nodes do you run?
What is your cache configuration?
As well, could you please try the following query and see the performa= nce?
SELECT DISTINCT age=C2=A0FROM=C2=A0Pe= rsonWithindex;

<= /div>

Kind regards,
Alex.

On Wed, May 10, 2017 at 5:57 PM, Guill= ermo Ortiz [via Apache Ignite Users] <[hidden email]> wrote= :
=09
It's my laptop. 16gb, i7.. The collection size it&#= 39;s 5Mill objects (1,2GB) and it takes about 30sec.=C2=A0
I have tried= to execute the query with the same collection with and without indices wit= h the same time results.=C2=A0

This is a test befo= re to run in the real cluster with 6 nodes of 512Gb and 48cores each one. W= e tried to execute there with a real dataset(22M rows, 7gb) but it takes so= long that query is aborted.=C2=A0

=

2017-= 05-08 12:12 GMT+02:00 Andrey Mashenkov <[hidden email]>:
Hi,

Why do you= think query is slow?=C2=A0
What its execution time? What is expected t= ime?
How many nodes do you use? How much data does cache contains?

On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <<= a href=3D"http:///user/SendEmail.jtp?type=3Dnode&node=3D12592&i=3D1= " rel=3D"nofollow" link=3D"external" target=3D"_blank">[hidden email]&g= t; wrote:
I have a simple cache where I have Person obje= ct (id, age and so on..)

I have created indeces by id an= d age to try some examples but queries with "group by" go really = slow.

I'm trying this query:
SELECT = age
FROM
PersonWithindex
group by age


SELECT
=C2=A0 = =C2=A0 AGE AS __C0
FROM "personCacheWithIndex".PERSONW<= wbr>ITHINDEX
=C2=A0 =C2=A0 /* "personCacheWithIndex".&q= uot;age_idx" */
GROUP BY AGE
/* group sorted = */
=C2=A0
SELECT
=C2=A0 =C2=A0 __C0 AS AGE
FROM PUBLIC.__T0
=C2=A0 =C2=A0 /* "personCacheWithIn= dex"."merge_scan" */
GROUP BY __C0

Although it seems that it uses index, why is it going = so slow? I think that it should be pretty fast with an index.



-= -
= Best regards,
Andrey V. Mashenkov

=09 =09 =09 =09
=09
=09
=09
=09=09
If you reply to this email, your mess= age will be added to the discussion below:
=09=09
http://apache-ignite-users.70518.x6.nabble= .com/GroupBy-with-index-is-really-slow-tp12526p12592.html =09
=09
=09=09To start a new topic under Apache Ignite Users, email [hidden email]
=09=09To unsubscribe from Apache Ignite Users, click here.
=09=09NAML =09

=09 =09 =09

View this message in context: Re: GroupBy with index is real= ly slow.
Sent from the Apache Ignite Users mai= ling list archive at Nabble.com.



=09 =09 =09 =09
=09
=09
=09
=09=09
If you reply to this email, your mess= age will be added to the discussion below:
=09=09
http://apache-ignite-users.70518.x6.n= abble.com/GroupBy-with-index-is-really-slow-tp12526p12599.html =09
=09
=09=09To start a new topic under Apache Ignite Users, email [hidden email]
=09=09To unsubscribe from Apache Ignite Users, click here.
=09=09NAML =09

=09 =09 =09

View this message in context: Re: Grou= pBy with index is really slow.
Sent from the A= pache Ignite Users mailing list archive at Nabble.com.
------=_Part_122191_2006772036.1494435776772--