spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Chen <baibaic...@gmail.com>
Subject Re: Lineage between Datasets
Date Thu, 13 Apr 2017 02:04:32 GMT
Does it mean any two Datasets's physical plans are independent?

Thanks
Chang

On Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin <rxin@databricks.com> wrote:

> The physical plans are not subtrees, but the analyzed plan (before the
> optimizer runs) is actually similar to "lineage". You can get that by
> calling explain(true) and look at the analyzed plan.
>
>
> On Wed, Apr 12, 2017 at 3:03 AM Chang Chen <baibaichen@gmail.com> wrote:
>
>> Hi All
>>
>> I believe that there is no lineage between datasets. Consider this case:
>>
>> val people = spark.read.parquet("...").as[Person]
>>
>> val ageGreatThan30 = people.filter("age > 30")
>>
>> Since the second DS can push down the condition, they are obviously
>> different logical plans and hence are different physical plan.
>>
>> What I understanding is right?
>>
>> Thanks
>> Chang
>>
>

Mime
View raw message