datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hayes <matthew.terence.ha...@gmail.com>
Subject Re: Mapping output of Hourglss jobs to hive tables
Date Wed, 12 Feb 2014 17:20:44 GMT
The jobs have methods getOutputSchemaName() and getOutputSchemaNamespace()
that can be overridden.  By default the strings are being derived from the
class and its package.  Just extend PartitionCollapsingIncrementalJob for
example and override them.  I just filed DATAFU-32 to make it easier to
override the defaults.

Regarding your other question about the key, when you construct the hive
table can you not ignore the key?


On Wed, Feb 12, 2014 at 2:06 AM, Abhishek Gayakwad <a.gayakwad@gmail.com>wrote:

> Hello,
>
> After running a partition collapsing or preserving job, the generated
> container file has schema as
> PartitionPreservingIncrementalJobOutput/PartitionCollapsingIncrementalJobOutput
> which further has key and value record types in it. When I create hive
> tables using this data, it has two columns for key and value of struct
> type. This takes away readability and is not what I want. I want to store
> only value object in output file. I there any way where I can get rid off
> Partition*JobOutput schema and avoid writing keys as well ?
>
> Thanks
> Abhishek
>
>  --
> You received this message because you are subscribed to the Google Groups
> "DataFu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datafu+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message