hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <navis....@nexr.com>
Subject Re: Override COUNT() function
Date Wed, 03 Jul 2013 01:56:33 GMT
As you expected, there is no documentation on it (like other optimizers)

Javadoc of the class might be helpful but seemed not in detail enough.

2013/7/2 Peter Marron <Peter.Marron@trilliumsoftware.com>:
> Thanks Navis,
>
> This is a very interesting class which I feel pretty sure that I would never have found.
> Are  there any descriptions, motivations, documentation or examples anywhere?
> I suspect that there's nothing other than the source itself, but I had to ask.
>
> Regards,
>
> Z
> -----Original Message-----
> From: Navis류승우 [mailto:navis.ryu@nexr.com]
> Sent: 02 July 2013 08:50
> To: user@hive.apache.org
> Subject: Re: Override COUNT() function
>
> MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan with one line
dummy.
>
> I think similar things can be done with stats.
>
> 2013/6/28 Peter Marron <Peter.Marron@trilliumsoftware.com>:
>> Hi,
>>
>>
>>
>> I feel sure that someone has asked for this before, but here goes…
>>
>>
>>
>> In the case where I have the query
>>
>>
>>
>>                 SELECT COUNT(*) FROM table;
>>
>>
>>
>> There are many cases where I can determine the count immediately.
>>
>> (For example if I have run something like:
>>
>>
>>
>> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2],
>> ...)] COMPUTE STATISTICS [noscan];
>>
>>
>>
>> then there seems to be a table property “numRows” which holds a count
>> of the number of rows.
>>
>> Now I know that the COUNT function can’t always be determined easily.
>>
>> If the query is more complicated, like
>>
>>
>>
>>                 SELECT COUNT(*) FROM table GROUP BY column;
>>
>>
>>
>> then obviously a simple scalar count is of no real use. But is there
>> some way
>>
>> to intercept the simple case and avoid running a table scan?
>>
>>
>>
>> One problem that I see is that the COUNT function is a UDAF and I am
>>
>> assuming that the presence of any aggregate function like this is
>> enough
>>
>> to force the query planner to require a Map/Reduce. Is there anyway
>>
>> to make the function look like a simple UDF for some queries? Or
>>
>> just for some tables? I guess that I’d be prepared to sacrifice the
>> full
>>
>> generality of the normal COUNT function for one which
>>
>> only functions correctly for the simple query on my tables.
>>
>>
>>
>> So is it possible to have a different COUNT function only on certain tables?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Z
>>
>>
>

Mime
View raw message