airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Feldhaus <Tobias.Feldh...@localsearch.ch>
Subject Re: Possible Bug (?) in BigQueryOperator - Missing data when writing to a partitioned table
Date Wed, 27 Sep 2017 16:41:12 GMT
I’ve created a table with only the missing value in the exact same partition, and then it’s
going through. Could it be that the volume of the data plays a role or the client libraries
maybe? 

On 27.09.2017, 17:46, "Tobias Feldhaus" <Tobias.Feldhaus@localsearch.ch> wrote:

    Hi,
    
    
    I am tracing a bug in one of our data pipelines and I narrowed it down to some small number
of events not being in a table (using Airflow 1.8.2).
    After running the query myself that airflow executed interactively, I saw the missing
entry. When airflow executed the same query, and writes the results to a partitioned table
in BQ it was missing in that destination table.
    I’ve tried different scenarios now several times and the only explanation or difference
I can come up with, is that airflow _might_ be that using partitioned tables is not fully
supported or there is some weird bug in the bigquery-python implementation.
    
    When deleting the table and recreating it and reloading the complete date with airflow
the data is still missing. When reloading a single day, it is also missing. I’ve created
a python script to execute the exact same query and it works as expected.
    
    Any advice how to track this down further? Is this a known issue?
    
    Best,
    Tobias
    
    
    

Mime
View raw message