airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Leslie-Waksman <waks...@gmail.com>
Subject Re: Plan to change type of dag_id from String to Number?
Date Thu, 16 Aug 2018 07:33:57 GMT
These performance characteristics are metadata database backend dependent
as well. If there are benchmarks, I would hope we look at them across
sqlite, mysql, postgresql, and any other supported backends before we take
action.

On Thu, Aug 9, 2018 at 12:41 PM Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> The change on perf for the DAG table would be extremely negligible.
>
> Maybe for task_instances (large table with millions of rows, 3 fields
> composite key) it *could* be a decent idea. Though you'd then need to have
> two indexes to store and maintain and we may have to change the code to
> actually use and reference that new more efficient pk in places where it's
> more efficient to use that index (some of it SQLAlchemy would do right out
> of the box).
>
> This mostly affects the index size (btree(id) is much smaller than
> btree(dag_id, task_id, execution_date)), not the key lookup time much as it
> is log(n). We'd still have to use the composite btree when we want to do
> range scans, which we use frequently to get sets of tasks for a dag or
> specific dag task. Since lookups are log(n), and that we need to maintain
> that composite btree anyways for range scans, I don't see where that would
> really help. It would be a better index (less pages, less memory usage,
> ...) if we didn't need that other composite one, which we do.
>
> Max
>
> On Thu, Aug 9, 2018 at 8:05 AM Vardan Gupta <vardanguptacse@gmail.com>
> wrote:
>
> > Point well taken on backward compatibility, we will have to take this
> > change very diligently, if implemented.
> >
> > On Thu, Aug 9, 2018 at 7:29 PM Юли Волкова <xnuinside@gmail.com>
wrote:
> >
> > > Because in case what you described nothing about backward
> compatibility.
> > > You think what all who use need to change all theirs DAG's? It's very
> > > strange, because you propose one of the most critical change and it
> will
> > > side everyone. If you want id - call it dag_metadata_id and add it. But
> > not
> > > propose change what hasn't backward compatibility. It's to strange.
> > >
> > > On Thu, Aug 9, 2018 at 7:04 AM vardanguptacse@gmail.com <
> > > vardanguptacse@gmail.com> wrote:
> > >
> > > >
> > > >
> > > > On 2018/08/09 11:55:11, Ash Berlin-Taylor <ash@apache.org> wrote:
> > > > > Absolutely - there will still need to be a human-readable DAG id,
> > even
> > > > we end up with an auto-icrementing integer ID column internally and
> for
> > > > table join performance reasons.
> > > > >
> > > > > -ash
> > > > >
> > > > > > On 9 Aug 2018, at 12:35, Юли Волкова <xnuinside@gmail.com>
> wrote:
> > > > > >
> > > > > > How will you understand what your DAG 00002 doing enter to it?
> For
> > > > each of
> > > > > > 100, for example?
> > > > > > Especially, if you are not a developer, who create it. You are
a
> > > > support
> > > > > > team and have 120 DAGs.
> > > > > >
> > > > > > The first time, when want to also send the answer to dev-mail
> list.
> > > > Please,
> > > > > > don't do it.
> > > > > >
> > > > > > I think it's will be really bad to all who use dag_id like a
> saying
> > > > name of
> > > > > > dag. If I will be looked at 0329313 this does not say anything
> > useful
> > > > for
> > > > > > me and it will be very very complicated to identify for which
> > process
> > > > dag
> > > > > > using.  It could be another id for the indexes in DB if it's
real
> > > > problem
> > > > > > for somebody. But, please, do not change dag_id.
> > > > > >
> > > > > > On Mon, Aug 6, 2018 at 1:32 AM vardanguptacse@gmail.com <
> > > > > > vardanguptacse@gmail.com> wrote:
> > > > > >
> > > > > >> Hi Everyone,
> > > > > >>
> > > > > >> Do we have any plan to change type of dag_id from String
to
> > Number,
> > > > this
> > > > > >> will make queries on metadata more performant, proposal
could be
> > > > generating
> > > > > >> an auto-incremental value in dag table and this id getting
used
> in
> > > > rest of
> > > > > >> the other tables?
> > > > > >>
> > > > > >>
> > > > > >> Regards,
> > > > > >> Vardan Gupta
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > _________
> > > > > >
> > > > > > С уважением, Юлия Волкова
> > > > > > Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82>
> > > > >
> > > > >
> > > >
> > > > Thanks Ash for your reply, I am aligned with what you're saying.
> > > >
> > > > I was not proposing to take away human readable dag_id instead I was
> > > > thinking, why can't we create another field like dag_name which will
> > hold
> > > > this information at all front facing sites while dag_id is changed to
> > > > integer, this will help in making joins work faster in metastore.
> > Though,
> > > > currently dag_id is indexed but still indexing int (4 bytes) vs
> > > > varchar(250) are going to take more index blocks and therefore more
> > look
> > > up
> > > > time. Also, if dag_id is not trivial to change to int, let it be
> > present
> > > > and let's introduce another col which is actually integer in type and
> > let
> > > > joining happen on this column across all tables.
> > > >
> > >
> > >
> > > --
> > > _________
> > >
> > > С уважением, Юлия Волкова
> > > Тел. : +7 (911) 116-71-82 <+7%20911%20116-71-82>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message