airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vardan Gupta <vardangupta...@gmail.com>
Subject Re: Plan to change type of dag_id from String to Number?
Date Thu, 09 Aug 2018 14:45:41 GMT
Absolutely, I'll work on producing some results. Also, it's not just a
matter of joining table, even pointed queries on individual tables like
task_instance, dag_run, fag_failure will be faster with integer identifier.

On Thu, Aug 9, 2018 at 7:59 PM Ash Berlin-Taylor <ash_apache@firemirror.com>
wrote:

> Since this is a big change that would touch much of the code base, before
> we do this we need to see some hard numbers - timing or benchmarks of
> queries etc.
>
> Also how often do we actually do such a join etc?
>
> -ash
>
> > On 9 Aug 2018, at 13:04, vardanguptacse@gmail.com <mailto:
> vardanguptacse@gmail.com> wrote:
> >
> > Thanks Ash for your reply, I am aligned with what you're saying.
> >
> > I was not proposing to take away human readable dag_id instead I was
> thinking, why can't we create another field like dag_name which will hold
> this information at all front facing sites while dag_id is changed to
> integer, this will help in making joins work faster in metastore. Though,
> currently dag_id is indexed but still indexing int (4 bytes) vs
> varchar(250) are going to take more index blocks and therefore more look up
> time. Also, if dag_id is not trivial to change to int, let it be present
> and let's introduce another col which is actually integer in type and let
> joining happen on this column across all tables.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message