arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject arrow git commit: ARROW-1526: [Python] Add unit test for fix in PARQUET-1100
Date Thu, 05 Oct 2017 12:55:56 GMT
Repository: arrow
Updated Branches:
  refs/heads/master 909a6f68a -> bd73166bd


ARROW-1526: [Python] Add unit test for fix in PARQUET-1100

This generates a table with large list elements to exercise the case where there were incomplete
number of repeated values decoded

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #1171 from wesm/ARROW-1526 and squashes the following commits:

bf260b9c [Wes McKinney] Add unit test for fix in PARQUET-1100


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/bd73166b
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/bd73166b
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/bd73166b

Branch: refs/heads/master
Commit: bd73166bde3d015118266a56d7db50eb20562857
Parents: 909a6f6
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Thu Oct 5 08:55:51 2017 -0400
Committer: Wes McKinney <wes.mckinney@twosigma.com>
Committed: Thu Oct 5 08:55:51 2017 -0400

----------------------------------------------------------------------
 python/pyarrow/tests/test_parquet.py | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/bd73166b/python/pyarrow/tests/test_parquet.py
----------------------------------------------------------------------
diff --git a/python/pyarrow/tests/test_parquet.py b/python/pyarrow/tests/test_parquet.py
index b0593fe..d51b85d 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -599,6 +599,23 @@ def test_date_time_types():
 
 
 @parquet
+def test_large_list_records():
+    # This was fixed in PARQUET-1100
+
+    list_lengths = np.random.randint(0, 500, size=50)
+    list_lengths[::10] = 0
+
+    list_values = [list(map(int, np.random.randint(0, 100, size=x)))
+                   if i % 8 else None
+                   for i, x in enumerate(list_lengths)]
+
+    a1 = pa.array(list_values)
+
+    table = pa.Table.from_arrays([a1], ['int_lists'])
+    _check_roundtrip(table)
+
+
+@parquet
 def test_sanitized_spark_field_names():
     a0 = pa.array([0, 1, 2, 3, 4])
     name = 'prohib; ,\t{}'


Mime
View raw message