airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vardanguptacse@gmail.com <vardangupta...@gmail.com>
Subject Re: Plan to change type of dag_id from String to Number?
Date Thu, 09 Aug 2018 12:04:22 GMT


On 2018/08/09 11:55:11, Ash Berlin-Taylor <ash@apache.org> wrote: 
> Absolutely - there will still need to be a human-readable DAG id, even we end up with
an auto-icrementing integer ID column internally and for table join performance reasons.
> 
> -ash
> 
> > On 9 Aug 2018, at 12:35, Юли Волкова <xnuinside@gmail.com> wrote:
> > 
> > How will you understand what your DAG 00002 doing enter to it? For each of
> > 100, for example?
> > Especially, if you are not a developer, who create it. You are a support
> > team and have 120 DAGs.
> > 
> > The first time, when want to also send the answer to dev-mail list. Please,
> > don't do it.
> > 
> > I think it's will be really bad to all who use dag_id like a saying name of
> > dag. If I will be looked at 0329313 this does not say anything useful for
> > me and it will be very very complicated to identify for which process dag
> > using.  It could be another id for the indexes in DB if it's real problem
> > for somebody. But, please, do not change dag_id.
> > 
> > On Mon, Aug 6, 2018 at 1:32 AM vardanguptacse@gmail.com <
> > vardanguptacse@gmail.com> wrote:
> > 
> >> Hi Everyone,
> >> 
> >> Do we have any plan to change type of dag_id from String to Number, this
> >> will make queries on metadata more performant, proposal could be generating
> >> an auto-incremental value in dag table and this id getting used in rest of
> >> the other tables?
> >> 
> >> 
> >> Regards,
> >> Vardan Gupta
> >> 
> > 
> > 
> > -- 
> > _________
> > 
> > С уважением, Юлия Волкова
> > Тел. : +7 (911) 116-71-82
> 
> 

Thanks Ash for your reply, I am aligned with what you're saying. 

I was not proposing to take away human readable dag_id instead I was thinking, why can't we
create another field like dag_name which will hold this information at all front facing sites
while dag_id is changed to integer, this will help in making joins work faster in metastore.
Though, currently dag_id is indexed but still indexing int (4 bytes) vs varchar(250) are going
to take more index blocks and therefore more look up time. Also, if dag_id is not trivial
to change to int, let it be present and let's introduce another col which is actually integer
in type and let joining happen on this column across all tables.

Mime
View raw message