Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 10B1B200CE6 for ; Fri, 1 Sep 2017 06:07:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0F0AB16C793; Fri, 1 Sep 2017 04:07:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 302D416C792 for ; Fri, 1 Sep 2017 06:07:26 +0200 (CEST) Received: (qmail 99554 invoked by uid 500); 1 Sep 2017 04:07:25 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 99543 invoked by uid 99); 1 Sep 2017 04:07:24 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Sep 2017 04:07:24 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id C497CE08F6; Fri, 1 Sep 2017 04:07:23 +0000 (UTC) From: HyukjinKwon To: reviews@spark.apache.org Reply-To: reviews@spark.apache.org References: In-Reply-To: Subject: [GitHub] spark issue #18647: [SPARK-21789][PYTHON] Remove obsolete codes for parsing ... Content-Type: text/plain Message-Id: <20170901040723.C497CE08F6@git1-us-west.apache.org> Date: Fri, 1 Sep 2017 04:07:23 +0000 (UTC) archived-at: Fri, 01 Sep 2017 04:07:27 -0000 Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18647 I double checked these **`_split_schema_abstract`**, **`_parse_field_abstract`**, **`_parse_schema_abstract`** and **`_infer_schema_type`** are not used in a public API. Under `./python/pyspark`: **1. `_split_schema_abstract`**: ``` $ grep -r "_split_schema_abstract" . ``` shows ``` ./sql/types.py:def _split_schema_abstract(s): ./sql/types.py: >>> _split_schema_abstract("a b c") ./sql/types.py: >>> _split_schema_abstract("a(a b)") ./sql/types.py: >>> _split_schema_abstract("a b[] c{a b}") ./sql/types.py: >>> _split_schema_abstract(" ") ./sql/types.py: parts = _split_schema_abstract(s) ``` Non doctests / tests: ``` ./sql/types.py: parts = _split_schema_abstract(s) ``` This is within **3. `_parse_schema_abstract`**: https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1274 **2. `_parse_field_abstract`**: ``` $ grep -r "_parse_field_abstract" . ``` shows ``` ./sql/types.py:def _parse_field_abstract(s): ./sql/types.py: >>> _parse_field_abstract("a") ./sql/types.py: >>> _parse_field_abstract("b(c d)") ./sql/types.py: >>> _parse_field_abstract("a[]") ./sql/types.py: >>> _parse_field_abstract("a{[]}") ./sql/types.py: fields = [_parse_field_abstract(p) for p in parts] ``` Non doctests / tests: ``` fields = [_parse_field_abstract(p) for p in parts] ``` This is within **3. `_parse_schema_abstract`**: https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1275 **3. `_parse_schema_abstract`**: ``` $ grep -r "_parse_schema_abstract" . ``` shows ``` ./sql/tests.py: from pyspark.sql.types import _parse_schema_abstract, _infer_schema_type ./sql/tests.py: schema = _parse_schema_abstract(abstract) ./sql/types.py: return StructField(name, _parse_schema_abstract(s[idx:]), True) ./sql/types.py:def _parse_schema_abstract(s): ./sql/types.py: >>> _parse_schema_abstract("a b c") ./sql/types.py: >>> _parse_schema_abstract("a[b c] b{}") ./sql/types.py: >>> _parse_schema_abstract("c{} d{a b}") ./sql/types.py: >>> _parse_schema_abstract("a b(t)").fields[1] ./sql/types.py: return _parse_schema_abstract(s[1:-1]) ./sql/types.py: return ArrayType(_parse_schema_abstract(s[1:-1]), True) ./sql/types.py: return MapType(NullType(), _parse_schema_abstract(s[1:-1])) ./sql/types.py: >>> schema = _parse_schema_abstract("a b c d") ./sql/types.py: >>> schema = _parse_schema_abstract("a[] b{c d}") ``` Non doctests / tests: ``` ./sql/types.py: return StructField(name, _parse_schema_abstract(s[idx:]), True) ./sql/types.py: return _parse_schema_abstract(s[1:-1]) ./sql/types.py: return ArrayType(_parse_schema_abstract(s[1:-1]), True) ./sql/types.py: return MapType(NullType(), _parse_schema_abstract(s[1:-1])) ``` These four are within **`2. _parse_field_abstract`** and within **`3. _parse_schema_abstract`**: https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1243 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1266 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1269 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1272 **4. `_infer_schema_type`**: ``` $ grep -r "_infer_schema_type" ``` shows ``` ./sql/tests.py: from pyspark.sql.types import _parse_schema_abstract, _infer_schema_type ./sql/tests.py: typedSchema = _infer_schema_type(rdd.first(), schema) ./sql/types.py:def _infer_schema_type(obj, dataType): ./sql/types.py: >>> _infer_schema_type(row, schema) ./sql/types.py: >>> _infer_schema_type(row, schema) ./sql/types.py: eType = _infer_schema_type(obj[0], dataType.elementType) ./sql/types.py: return MapType(_infer_schema_type(k, dataType.keyType), ./sql/types.py: _infer_schema_type(v, dataType.valueType)) ./sql/types.py: fields = [StructField(f.name, _infer_schema_type(o, f.dataType), True) ``` Non doctests / tests: ``` ./sql/types.py: eType = _infer_schema_type(obj[0], dataType.elementType) ./sql/types.py: return MapType(_infer_schema_type(k, dataType.keyType), ./sql/types.py: _infer_schema_type(v, dataType.valueType)) ./sql/types.py: fields = [StructField(f.name, _infer_schema_type(o, f.dataType), True) ``` These four are within **4. `_infer_schema_type`**: https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1299 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1304 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1305 https://github.com/apache/spark/blob/b56f79cc359d093d757af83171175cfd933162d1/python/pyspark/sql/types.py#L1311 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org