hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagat Singh <jagatsi...@gmail.com>
Subject Re: Options for Loading Side Data / small files in UDF
Date Sat, 14 Sep 2013 00:23:46 GMT
Sorry i missed that

Just check this example for accessing from API

https://github.com/edwardcapriolo/hive-geoip/




On Sat, Sep 14, 2013 at 10:12 AM, Stephen Boesch <javadba@gmail.com> wrote:

> I should have mentioned:  we can not use the "add file" here because this
> is running within a framework.   we need to use Java api's
>
>
> 2013/9/13 Jagat Singh <jagatsingh@gmail.com>
>
>> Hi
>>
>> You can use distributed cache and hive add file command
>>
>> See here for example syntax
>>
>>
>> http://stackoverflow.com/questions/15429040/add-multiple-files-to-distributed-cache-in-hive
>>
>> Regards,
>>
>> Jagat
>>
>>
>> On Sat, Sep 14, 2013 at 9:57 AM, Stephen Boesch <javadba@gmail.com>wrote:
>>
>>>
>>> We have a UDF that is configured via a small properties file.  What are
>>> the options for distributing the file for the task nodes?  Also we want to
>>> be able to update the file frequently.
>>>
>>> We are not running on AWS so S3 is not an option - and we do not have
>>> access to NFS/other shared disk from the Mappers.
>>>
>>> If the hive classes can access HDFS that would be likely most ideal -
>>> and it would seem should be possible.  I am not clear how to do that -
>>> since the standard hdfs api requires the  Configuration to be supplied -
>>> which is not available.
>>>
>>> Pointers appreciated.
>>>
>>> stephenb
>>>
>>
>>
>

Mime
View raw message