hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Tolpeko <dmtolp...@gmail.com>
Subject Re: Better way to do UDF's for Hive
Date Thu, 01 Oct 2015 14:34:16 GMT
In case of single string input Java UDF can be easier to write: accept
string parameter, lookup hash map and return. In case of Python you have to
use TRANSFORM clause and handle all columns, so it will be hard to reuse
your Python script as the code may depend on the column position.

One other possible option is to move state_from_city data into a separate
table and use a map-side join.

Dmitry

On Thu, Oct 1, 2015 at 4:11 PM, Daniel Lopes <daniel@bankfacil.com.br>
wrote:

> Hi,
>
> I'd like to know the good way to do a a UDF for a single field, like
>
> SELECT
>   tbl.id AS id,
>   tbl.name AS name,
>   tbl.city AS city,
>   state_from_city(tbl.city) AS state
> FROM
>   my_db.my_table tbl;
>
> *Native Java*? *Python *over *Hadoop* *Streaming*?
>
> I prefer Python, but I don't know how to do in a good way.
>
> Thanks,
>
> *Daniel Lopes, B.Eng*
> Data Scientist - BankFacil
> CREA/SP 5069410560
> <http://edital.confea.org.br/ConsultaProfissional/cartao.aspx?rnp=2613651334>
> Mob +55 (18) 99764-2733 <callto:+5518997642733>
> Ph +55 (11) 3522-8009
> http://about.me/dannyeuu
>
> Av. Nova Independência, 956, São Paulo, SP
> Bairro Brooklin Paulista
> CEP 04570-001
> https://www.bankfacil.com.br
>
>

Mime
View raw message