beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From OrielResearch Eila Arich-Landkof <e...@orielresearch.org>
Subject Re: Returning dataframe from parDo and printing its value - advice?
Date Mon, 18 Jun 2018 20:38:45 GMT
Thanks for the response.
I tried this within the current parDo, CreateColForSampleFn, Apache beam
returns a warning with recommendation not to return a string.

So, my questions are:
- Is it essential to separate this transformation in a different ParDo?
- Should I ignore that message? When is this message relevant?

Many thanks,
Eila

On Mon, Jun 18, 2018 at 2:52 PM Lukasz Cwik <lcwik@google.com> wrote:

> User is the correct mailing list.
>
> beam.io.WriteToText takes 'strings' which means that you have to format
> the whole line yourself. You'll want to apply another ParDo
> after CreateColForSampleFn which takes the 1x164 record and concatenates
> each value with ',' in between.
>
> On Mon, Jun 18, 2018 at 9:00 AM OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> Hi,
>>
>> Is anyone listening on the user@ mailing list? or should I use a
>> different mailing list?
>>
>> I have made some progress.
>> - ParDo returns a List now
>> - add a header to the WriteToText.
>>
>> The pipeline looks like that:
>> ExploreData = (p | "Extract the rows from dataframe" >>
>> beam.io.Read(beam.io.BigQuerySource('archs4.Debug_annotation'))
>>                 | "create more columns" >>
>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
>>
>> (ExploreData | 'writing to CSV files' >>
>> beam.io.WriteToText('gs://dataExploration.txt',file_name_suffix='.csv',num_shards=1,append_trailing_newlines=True,header=colListStr))
>>
>>
>> The remaining issue is that the output has new line after each value:
>>
>> *None
>> None
>> None
>> None
>> None
>>  30
>>  Primary Tissue
>> None
>> None
>> None*
>>
>> Please let me know how do I get read from this new lines. I hope to be able to open
the output file with Google Sheet.
>>
>>
>> Thanks,
>>
>> Eila
>>
>>
>>
>> On Fri, Jun 15, 2018 at 2:45 PM, OrielResearch Eila Arich-Landkof <
>> eila@orielresearch.org> wrote:
>>
>>> Hi all,
>>>
>>> I am running a pipeline, where a table from BQ is being processed line
>>> by line using ParDo function.
>>> CreateColForSampleFn generates a data frame, with headers and values
>>> (shape: 1x164 ) that I want to pass to WriteToText.
>>> See the followings:
>>>
>>> ExploreData = (p | "Extract the rows from dataframe" >> beam.io.Read(
>>> beam.io.BigQuerySource('archs4.Debug_annotation'))
>>>                 | "create more columns" >>
>>> beam.ParDo(CreateColForSampleFn(colListSubset,outputPath)))
>>>
>>> (ExploreData | 'writing to CSV files' >>
>>> beam.io.WriteToText('gs://dataExploration.txt',num_shards=1))
>>>
>>> My questions are related to the returned DF and WriteToText:
>>> 1. when I pass DF from the CreateColForSampleFn to WriteToText , I get
>>> only the headers:
>>>
>>> Sample_contact_phone
>>> Sample_extract_protocol_ch1
>>> Sample_platform_id
>>> Sick
>>> Sample_title
>>> index
>>> Sample_last_update_date
>>> Sample_contact_country
>>> Sample_channel_count
>>> Sample_library_source
>>> Sample_taxid_ch1
>>>
>>>
>>> 2. When I return the df in a list [df], I get the following txt for each
>>> row (including the dimensions)
>>>
>>>  Sample_contact_phone                        Sample_extract_protocol_ch1 Sample_platform_id
 Sick
>>>
>>> 0                       Library construction protocol: Four µg of tota...
          GPL11154  None
>>>
>>> [1 rows x 168 columns]
>>>
>>>
>>>
>>> I want to generate a text file that includes:
>>> - One header (if needed, I will add it after the pipeline completed)
>>> - All the values from each rows that was processed and generated DF
>>> - Full cell values, without ... in the middle
>>>
>>> What am I missing? any advice?
>>>
>>> Thanks,
>>> --
>>> Eila
>>> www.orielresearch.org
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>> m/Deep-Learning-In-Production/
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>>
>>>
>>
>>
>> --
>> Eila
>> www.orielresearch.org
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>> m/Deep-Learning-In-Production/
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>>
>> --
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Mime
View raw message