Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 38D4217A1A for ; Wed, 18 Mar 2015 14:07:00 +0000 (UTC) Received: (qmail 39380 invoked by uid 500); 18 Mar 2015 14:06:58 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 39310 invoked by uid 500); 18 Mar 2015 14:06:58 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 39285 invoked by uid 99); 18 Mar 2015 14:06:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 14:06:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lian.cs.zju@gmail.com designates 209.85.220.44 as permitted sender) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 14:06:31 +0000 Received: by pabyw6 with SMTP id yw6so43678409pab.2 for ; Wed, 18 Mar 2015 07:05:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=byFKMdLYj+iLyZr11RdOKFAaV8qOgZgO2KLmvhVfScc=; b=bcr0ZjJcBejakrjEkzYqz11AcgriE9dR6tCV21SEfl3PfZRCs3nYAfo3z1OTGMi6+n kOwBXjyEj4byRZ+fJprqXOqbn12JFtHslwnCQjHhVts9pEJGNGWLoku9gkEXzxrFQhSe DIbVXhCR+kDEzXU5kcgTrSnH60R6LuDiQIzIXvlmZQG0+IQtCk/7X5VhYWmzkEyxKJ8Q +n0rGYGr/AzlvuEWlUa3zglhrmRhhNG8Q9vPZEFVzfjMQoIdUYRsFF+J8iJeEE1m09aQ UQldu5VCLQVA0xncD+BSWsDvgA0Q67b20BC+DvNvzzG1X8qS//D1If5NvqzIyV9/vXr8 PhKQ== X-Received: by 10.66.229.34 with SMTP id sn2mr94124123pac.92.1426687544370; Wed, 18 Mar 2015 07:05:44 -0700 (PDT) Received: from [192.168.10.2] (c-50-131-222-227.hsd1.ca.comcast.net. [50.131.222.227]) by mx.google.com with ESMTPSA id dq4sm27781701pdb.96.2015.03.18.07.05.41 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2015 07:05:43 -0700 (PDT) Message-ID: <55098637.1010202@gmail.com> Date: Wed, 18 Mar 2015 22:05:43 +0800 From: Cheng Lian User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Gil Vernik , dev Subject: Re: parquet support - some questions about code References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hey Gil, ParquetRelation2 is based on the external data sources API, which is a more modular and non-intrusive way to add external data sources to Spark SQL. We are planning to replace ParquetRelation with ParquetRelation2 entirely after the latter is more mature and stable. That's why you see two separate sets of Parquet code in the code base, and currently they also share part of the code. In Spark 1.3, the new Parquet data source (ParquetRelation2) is enabled by default. So you can find entries of projection and filter push-down code in newParquet.scala. Cheng On 3/18/15 9:46 PM, Gil Vernik wrote: > Hi, > > I am trying to better understand the code for Parquet support. > In particular i got lost trying to understand ParquetRelation and > ParquetRelation2. Does ParquetRelation2 is the new code that should > completely remove ParquetRelation? ( I think there is some remark in the > code notifying this ) > > Assuming i am using > spark.sql.parquet.filterPushdown = true > spark.sql.parquet.useDataSourceApi = true > > I saw that method buildScan from newParquet.scala has filtering push down > into Parquet, but i also saw that there is filtering and projection push > down from ParquetOperations inside SparkStrategies.scala > However every time i debug it, the > object ParquetOperations extends Strategy { > def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { > .......... > Never evaluated to case PhysicalOperation(projectList, filters: > Seq[Expression], relation: ParquetRelation) => > > In which cases it will match this case? > > Also, where is the code for Parquet projection and filter push down, is it > inside ParquetOperations in SparkStrategies.scala or inside buildScan of > newParquet.scala? Or both? If so i am not sure how it works... > > Thanks, > Gil. > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org