ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: Group indices and group by query
Date Thu, 11 May 2017 18:23:08 GMT
Have you tried to avoid MAX with GROUP_BY and use technique with self-join?
Smth like that

select young.*, younger.age
from person as young
   left outer join person as younger on younger.gender = young.gender
      and younger.age < young.age


On Thu, May 11, 2017 at 3:14 PM, Guillermo Ortiz <konstt2000@gmail.com>
wrote:

> I tried both to get more variety of values and index should be work better
> to get the max(id) than max(age), but I got the same result.
>
> Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY
> MONTH:2621ms
> [SELECT
>     __Z0.MONTH AS __C0_0,
>     MAX(__Z0.ID) AS __C0_1
> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>     /* "PersonWithIndex".*INDEX_GROUP_MONTH_ID **/
> GROUP BY __Z0.MONTH]
> [SELECT
>     __C0_0 AS MONTH,
>     MAX(__C0_1) AS __C0_1
> FROM PUBLIC.__T0
>     /* "PersonWithIndex"."merge_scan" */
> GROUP BY __C0_0]
>
> Although it seems that it uses it  when I get times, seems the opposite.
>
> 2017-05-11 13:55 GMT+02:00 Andrey Mashenkov <andrey.mashenkov@gmail.com>:
>
>> Query use PERSONWITHINDEX_MONTH_IDX instead of index_group_month_age.
>>
>> Do you mean query "SELECT MONTH, MAX(*AGE*)"?
>>
>> On Wed, May 10, 2017 at 6:21 PM, Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>>
>>> Is there any limitation like databases which if it has to read more than
>>> x% of data it do a full scan?
>>>
>>> I tried this:
>>>
>>> Result *SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY MONTH *
>>> September,1999868,February,1999869,March,1999871,August,1999
>>> 844,June,1999867,April,1999854,July,1999866,January,1999857,
>>> May,1999862,November,1999830,October,1999864,
>>> Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY MONTH:
>>> *2743ms*
>>> [SELECT
>>>     __Z0.MONTH AS __C0_0,
>>>     MAX(__Z0.ID) AS __C0_1
>>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>>>     /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */
>>> GROUP BY __Z0.MONTH
>>> /* group sorted */]
>>> [SELECT
>>>     __C0_0 AS MONTH,
>>>     MAX(__C0_1) AS __C0_1
>>> FROM PUBLIC.__T0
>>>     /* "PersonWithIndex"."merge_scan" */
>>> GROUP BY __C0_0]
>>>
>>>
>>> ******Result *SELECT MONTH, MAX(ID) FROM PERSONWITHOUTINDEX GROUP BY
>>> MONTH* September,1999860,November,1999870,February,1999869,March,19
>>> 99866,May,1999864,April,1999865,July,1999859,August,1999871,
>>> January,1999867,June,1999868,October,1999854,
>>> Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHOUTINDEX GROUP BY
>>> MONTH:*2418ms*
>>> [SELECT
>>>     __Z0.MONTH AS __C0_0,
>>>     MAX(__Z0.ID) AS __C0_1
>>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>>>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */
>>> GROUP BY __Z0.MONTH]
>>> [SELECT
>>>     __C0_0 AS MONTH,
>>>     MAX(__C0_1) AS __C0_1
>>> FROM PUBLIC.__T0
>>>     /* "PersonWithoutIndex"."merge_scan" */
>>> GROUP BY __C0_0]
>>>
>>>
>>> I tried to execute many times same query and times are similar all the
>>> times, I guess that indices aren't being useful. Am I doing something bad?
>>>
>>> public class PersonWithIndex implements Serializable {
>>>     @QuerySqlField(index = true)
>>>     public int id;
>>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age",
order=1)})
>>>     public int age;
>>>     @QuerySqlField
>>>     public String dni;
>>>     @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name="index_group_month_age",
order=0)})
>>>     public String month;
>>>
>>> ....
>>>
>>>
>>>
>>> public class PersonWithoutIndex implements Serializable {
>>>     @QuerySqlField
>>>     public int id;
>>>     @QuerySqlField
>>>     public int age;
>>>     @QuerySqlField
>>>     public String dni;
>>>     @QuerySqlField
>>>     public String month;
>>>
>>>
>>> 2017-05-08 12:32 GMT+02:00 Andrey Mashenkov <andrey.mashenkov@gmail.com>
>>> :
>>>
>>>> Group indices description can be found here [1]. Please, check if you
>>>> read docs for your ignite version, as configuration for ignite 1.x and 2.0
>>>> can differs.
>>>> Group indices in Ignite a similar to composite or multi-column indixes
>>>> in databases.
>>>>
>>>> Yes, it is work that way. Group index for (groupFiled, timestampField)
>>>> should be helpful.
>>>> You can try to use EXPLAIN comand to check if correct index is in use.
>>>>
>>>>
>>>> [1] https://apacheignite.readme.io/docs/indexes#section-group-indexes
>>>>
>>>> On Fri, May 5, 2017 at 8:17 PM, Guillermo Ortiz <konstt2000@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> If I have two single indices to query:
>>>>>
>>>>> select * table a,b where a=1 and b=2
>>>>>
>>>>> it doesn't work pretty good and I have to create a group index, how is
>>>>> that possible? how does group indices work?
>>>>>
>>>>> Similar to this:
>>>>>
>>>>> select max(timestampField)
>>>>> from myTable
>>>>> group by groupField
>>>>>
>>>>> I have a index by groupField what it means that it should be really
>>>>> fast to make the group and later make a partial map to find in each node
>>>>> the max of the subset of data to merge the final submax and get the final
>>>>> result in a "reduce" task after sending data through network. Is it how
>>>>> does it work?
>>>>> This query it's really slow, I don't know if I created an index by
>>>>> timestampField I would get better performance, but it doesn't too much
>>>>> sense to create a index with a timestampField where there aren't too
row
>>>>> with the same value in a table with 250M rows....
>>>>> Any advice about this query?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov

Mime
View raw message