hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvamohan Neethiraj <selva.apa...@infotekies.com>
Subject Re: Data transformation query
Date Thu, 10 Oct 2013 12:31:04 GMT
If it is not so much data, you can use Excel's PivotTable to solve this specific requirement:

1. Select the date and plz columns (w/o header) and create pivotTable on a new WorkSheet
2. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the 'Column Label' section.
3. Drag the column name: 'PLZ' from the 'Pivot Table' Builder  to the  'Row Labels'
4. Drag the column name: 'Date' from the 'Pivot Table' Builder  to the  'Values' and ensure
the selected summary function is 'Count'

Hope, this helps,

Thanks,
Selva-



On Oct 10, 2013, at 7:53 AM, Panshul Whisper <ouchwhisper@gmail.com> wrote:

> Hello,
> 
> I have a data manipulation query.
> 
> I have my data in the following format:
> 
> Date   PLZ   Count
> date1   plz1   count1
> date1   plz1   count2
> date1   plz1   count3
> date1   plz2   count4
> date1   plz2   count5
> date1   plz3   count6
> date1   plz3   count7
> 
> date2   plz1   count8
> date2   plz1   count9
> date2   plz3   count10
> date2   plz3   count11
> 
> date3   plz1   count12
> date3   plz1   count13
> date3   plz2   count14
> date3   plz2   count15
> date3   plz2   count16
> 
> date4   plz1   count17
> date4   plz2   count18
> date4   plz3   count19
> 
> With Hive queries I have managed to get the data into the following grouped and aggregated
format:
> 
> select plz, TO_DATE(time), sum(totalcount) from power_pad_part where
> yearfolder='2013' and monthfolder in ('01')
> and eco=0 and TO_DATE(time)>='Date1' and TO_DATE(time)<'Date4'
> and plz in ('plz1','plz2','plz3')
> group by plz, time
> 
> PLZ   Date   TotalCount
> plz1   date1   TC1
> plz1   date2   TC2
> plz1   date3   TC3
> plz1   date4   TC4
> 
> plz2   date1   TC5
> plz2   date3   TC6
> plz2   date4   TC7
> 
> plz3   date1   TC8
> plz3   date2   TC9
> plz3   date4   TC10
> 
> The above data is grouped by plz and date. There is no entry of a plz if it does not
exist for a date.
> I used the following query to generate the above data:
> 
> 
> 
> But I require the data to be in the following format:
> 
> PLZ   Date1   Date2   Date3   Date4 ..... DateN .....
> plz1   TC1       TC2      TC3       TC4   .....  TC N
> plz2   TC5       ------      TC6       TC7   .....  TC N
> plz3   TC8       TC9      ------       TC10  ....  TC N
> 
> 
> This needs to be generated as 1 column for every date of the year.
> 
> Any help to generate this format is welcome. Writing my own mapper and reduce and calling
in Hive as function  is to be my last option.
> 
> Thanking You,
> 
> 
> -- 
> Regards,
> Ouch Whisper
> 010101010101


Mime
View raw message