spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <>
Subject Diffing execution plans to understand an optimizer bug
Date Tue, 08 Nov 2016 21:42:04 GMT
I’m trying to understand what I think is an optimizer bug. To do that, I’d
like to compare the execution plans for a certain query with and without a
certain change, to understand how that change is impacting the plan.

How would I do that in PySpark? I’m working with 2.0.1, but I can use
master if it helps.

is helpful but is limited in two important ways:

   1. It prints to screen and doesn’t offer another way to access the plan
   or capture it.

   The printed plan includes auto-generated IDs that make diffing
   impossible. e.g.

    == Physical Plan ==
    *Project [struct(primary_key#722, person#550, dataset_name#671)

Any suggestions on what to do? Any relevant JIRAs I should follow?


View raw message