hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Liu <g...@fb.com>
Subject Re: Hive List Bucketing - Feature Review
Date Thu, 14 Jun 2012 17:38:21 GMT
Would you please elaborate on it?

Thanks 

Tim

On 6/14/12 10:30 AM, "Edward Capriolo" <edlinuxguru@gmail.com> wrote:

>We have had a ticket open for quite some time for combine input format
>to work across partitions. Not sure if that can help with what you are
>seeing as well. It could help us alot.
>
>Edward
>
>On Thu, Jun 14, 2012 at 1:25 PM, Gang Liu <gang@fb.com> wrote:
>> Hey Edward,
>>
>> Thank you very much for providing comments.
>>
>> This feature is designed for use cases described in wiki. We do see them
>> in the real life so that we come up with the feature.
>>
>> In this first release, in order to use the feature:
>> 1. Hive table users need to know the skewed key in advance
>> 2. Hive table users need to know the skewed key is the same each
>> partition.
>> 3. If Hive table users know skewed key change, they can "alter" skewed
>>key
>> via "alter" statement.
>>
>> 4. If #3 happens, old partitions have old skewed key and new partition
>> have new. It's expected.
>>
>> We may consider the following in the future release:
>> 1. Hive instruments skewed key and displays them to user
>>
>> Thanks
>>
>> Tim
>>
>>
>> On 6/14/12 9:34 AM, "Edward Capriolo" <edlinuxguru@gmail.com> wrote:
>>
>>>I am of the opinion this feature is too specialized to be generally
>>>helpful.
>>>
>>>-------------------------------
>>>The cardinality of 'x' is in 1000's per partition of T. Moreover,
>>>there is a skew for the values of 'x'. In general, there are ~10
>>>values of 'x' which have a very large skew, and the remaining
>>>values of 'x' have a small cardinality. Also, note that this mapping
>>>(values of 'x' with a high cardinality can change daily).
>>>--------------------------
>>>
>>>In these cases you should use clustering/bucketing. This will prevent
>>>the skew you are talking about. If you want more efficiency in certain
>>>query types build a index on top of the original table.
>>>
>>>I understand someone wanting to do this because mysql partition can do
>>>this, but this sounds like a management problem. Who is to say the
>>>skew is the same each partition?
>>>
>>>-----------------------------------------
>>>hive compiler to do input pruning. The list of skewed keys is stored
>>>at the table level (note that, this list can be initially supplied by
>>>the client periodically, and can be eventually updated when a new
>>>partition is being loaded).
>>>-----------------------------------------
>>>
>>>Imagine you have a table partitioned by hour and two datacenters China
>>>and NY. At some hours the skew will be different. Skews change over
>>>time. Since this property is table level I do not understand how this
>>>would be changed.
>>>
>>>
>>>
>>>On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <carl@cloudera.com>
>>>wrote:
>>>> Hi Tim,
>>>>
>>>> I added some comments to the wiki a couple days ago. I just wanted to
>>>>make
>>>> sure you saw them since it doesn't look like you're registered as a
>>>>watcher
>>>> for that page.
>>>>
>>>> Thanks.
>>>>
>>>> Carl
>>>>
>>>> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <gang@fb.com> wrote:
>>>>
>>>>> Hi Carl, thanks Tim
>>>>>
>>>>> On 6/11/12 12:14 PM, "Carl Steinbach" <carl@cloudera.com> wrote:
>>>>>
>>>>> >+ hcatalog-dev
>>>>> >
>>>>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <carl@cloudera.com>
>>>>> >wrote:
>>>>> >
>>>>> >> This link may work better for some people:
>>>>> >>
>>>>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
>>>>> >>
>>>>> >> Thanks.
>>>>> >>
>>>>> >> Carl
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <gang@fb.com>
wrote:
>>>>> >>
>>>>> >>> Dear all hive developers,
>>>>> >>>
>>>>> >>> We are making good progress of implementing the list bucketing
>>>>> >>>feature. It
>>>>> >>> should be available soon in weeks.
>>>>> >>>
>>>>> >>> We'd like to call feature review again and please provide
your
>>>>> >>>comments.
>>>>> >>>
>>>>> >>> Thanks
>>>>> >>>
>>>>> >>> Tim
>>>>> >>>
>>>>> >>> On 6/1/12 10:13 AM, "Gang Liu" <gang@fb.com> wrote:
>>>>> >>>
>>>>> >>> >Dear all,
>>>>> >>> >
>>>>> >>> >Please review the proposal and provide your comments:
>>>>> >>> >
>>>>> >>> >https://cwiki.apache.org/Hive/listbucketing.html
>>>>> >>> >
>>>>> >>> >
>>>>> >>> >Thanks
>>>>> >>> >
>>>>> >>> >Tim
>>>>> >>> >
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>>
>>>>>
>>


Mime
View raw message