hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohan monga <monga.ro...@gmail.com>
Subject Re: problems with indices
Date Thu, 22 Sep 2011 15:58:02 GMT
Hi Prajakta,
I get it now. But are there plans to support groupbys of that format,
or would I be able to write a UDF of sorts to do that?

Also, could you please give me an example of accelerating equi-joins
with indices, I haven't been able to get that to work either :(

Regards,
--
Rohan Monga



On Wed, Sep 21, 2011 at 6:15 PM, Prajakta Kalmegh <pkalmegh@gmail.com> wrote:
> Hi Rohan
> As of now, we support queries with aggregate function count on columns which
> are the same as index key column and are used in group-by construct. For
> example,
> CREATE TABLE tbl(key int, value int);
> CREATE INDEX tbl_key_idx ON TABLE tbl(key) AS
> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED
> REBUILD IDXPROPERTIES("AGGREGATES"="count(key)");
> ALTER INDEX tbl_key_idx ON tbl REBUILD;
> It is valid to use the following query:
> select key, count(key) from tbl where key < 1000 group by key;
> But not valid to use:
> select value, count(key) from tbl group by value; (I think this is the same
> as your case)
> As of now, you cannot use the aggregate index in case of join queries.
> For the semantic analysis error, I apologize for the typo in the code. It
> worked for us as it was uniform in all the classes :)
> You forgot to change it in a few more instances and hence the semantic
> error. The aggregate index itself creates the column references with a
> 'count_Of..' in it. We have fixed the issue and should be updated in the
> trunk soon. Thanks for noticing it though. :)
> Regards,
> Prajakta
>
>
>
>
>
>
>
> On Wed, Sep 21, 2011 at 12:28 AM, rohan monga <monga.rohan@gmail.com> wrote:
>>
>> Hi Prajakta,
>> Thanks, I was able to generate a plan which was using indices by
>> following your advise.
>> However, I want to group on one key and count the others
>> "select count(id2) from table_t1 group by id1", would that not be possible
>> ?
>>
>> Also, could you give me an example of using indices to speed up joins?
>>
>> And for the semantic analysis thing, I sent the error in my first
>> email, here are the lines that I modified to get it to work. Basically
>> changed "_count_Of" to "_count_of"
>>
>> Regards,
>> --
>> Rohan Monga
>>
>>
>>
>> On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalmegh@gmail.com>
>> wrote:
>> > Hi Rohan
>> > I run your queries on the same version from trunk and did not get any
>> > errors. Not sure why you are getting a semantic analysis error. It would
>> > be
>> > good if you could send me a snapshot of the error message.
>> > About the second issue of getting the same plan even if you
>> > set hive.optimize.index.groupby to true or false, we do not optimize for
>> > cases where the count (aggregate functions) are applied on keys other
>> > than
>> > the group-by keys. Since you are trying to get a count of id2 and have
>> > the
>> > index key (as well as the group-by key) on id1, the optimization is not
>> > applied. Hence the same plan.
>> > Hope this helps. Please let me know if you have any questions.
>> > Regards,
>> > Prajakta
>> >
>> >
>> >> From: rohan monga <monga.rohan@gmail.com>
>> >> Date: September 19, 2011 11:26:29 PM PDT
>> >> To: <user@hive.apache.org>
>> >> Subject: problems with indices
>> >> Reply-To: <user@hive.apache.org>
>> >>
>> >> Hi,
>> >> I have a table and index that look like
>> >>
>> >> <snip>
>> >> CREATE TABLE table_t1(id1 int,  id2 int)
>> >> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS
>> >> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED
>> >> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)");
>> >> ALTER INDEX table_t1_idx ON table_t1 REBUILD;
>> >> </snip>
>> >>
>> >> Now, I am seeing 2 problems with the query 'set
>> >> hive.optimize.index.groupby=false;select count(id2) from table_t1
>> >> group by id1',
>> >> firstly,
>> >>
>> >> <snip>
>> >> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or
>> >> column reference '`_count_Of_id2`': (possible column names are: id1,
>> >> _bucketname, _offsets, _count_of_id2)
>> >> </snip>
>> >>
>> >> So, I assumed that it was a typo or something, and I change all 'Of'
>> >> to 'of' in the relevant places in the code.
>> >>
>> >> Then I ran the query again, with the option set to true and then to
>> >> false, the run times were almost the same. Also 'explain' on the
>> >> queries show identical query plans.
>> >> I am using hive revision 1172989 from trunk
>> >>
>> >> Could someone help me with this?
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Rohan Monga
>> >
>> >
>> >
>
>

Mime
View raw message