hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hive UDF performance issue
Date Fri, 11 Jul 2014 02:28:05 GMT
The "small" table can be any size. You want the small table to be
/path/to/table/b here because that will result in more parallelism. There
is a ticket on hive theta join that you might want to look at.


On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S <malligarjunan@gmail.com>
wrote:

> Hello Edwards,
>
> Thank you very much for the update.
> What size you mean is small table. In our case the small table will have
> minimum of 1 million records.
> Can we use this UDTF? how much time improvement will be there?
>
> Appreciate your help!
> Thanks and Regards
> SankarS
>
>
> On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>
>> There is no magic. Hopefully one table is smaller then the other. You
>> could make a UDTF to do something like this MR job is doing
>>
>> Make a mapper that runs over table A.
>> InputFormat.setInputPath("/path/to/table/a")
>>
>> Then inside the mapper
>>
>> private Conf c
>> setup(Conf c){
>>   this.c = c
>> }
>> public void map(Text key, Text value, Collector c){
>>   FileSystem fs = Filesystem.get(c);
>>   file f =fs.open("/path/to/table/b")
>>   for (line in f){
>>     c.collect( value + line);
>>   }
>> }
>>
>>
>>
>> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S <
>> malligarjunan@gmail.com> wrote:
>>
>>> Hello Edward,
>>>
>>> Thank you very much for helping me.
>>> I am new to hive.  Could you please provide the sample map reduce job?
>>>
>>> Regards,
>>> Sankar S
>>>
>>>
>>>
>>>
>>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <edlinuxguru@gmail.com>
>>> wrote:
>>>
>>>> Hive cross product stinks . I have a map reduce job that will do it
>>>>
>>>>
>>>> On Wednesday, July 9, 2014, Navis류승우 <navis.ryu@nexr.com> wrote:
>>>>
>>>>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>>>>
>>>>> Thanks,
>>>>> Navis
>>>>>
>>>>>
>>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <malligarjunan@gmail.com>:
>>>>>
>>>>>> Hello All,
>>>>>> Is that the expected behavior from hive to take so much of time?
>>>>>>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sankar S
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>>>>> malligarjunan@gmail.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> Can any one help me to answer to my question posted on Stackoverflow?
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>>>>>> It is pretty urgent. Please help me.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Sankar S.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>> than usual.
>>>>
>>>
>>>
>>
>

Mime
View raw message