Return-Path:
conda create -y -q -n pyarrow-dev \
python=3.6 numpy six setuptools cython pandas pytest \
cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
- brotli jemalloc lz4-c zstd -c conda-forge
+ gflags brotli jemalloc lz4-c zstd -c conda-forge
source activate pyarrow-dev
On Debian/Ubuntu, you need the following minimal set of dependencies. All other -dependencies will be automatically built by Arrow’ thrid-party toolchain.
+dependencies will be automatically built by Arrow’s third-party toolchain.$ sudo apt-get install libjemalloc-dev libboost-dev \
libboost-filesystem-dev \
libboost-system-dev
On Arch Linux, you can get these dependencies via pacman.
+$ sudo pacman -S jemalloc boost
+
Now, let’s create a Python virtualenv with all Python dependencies in the same folder as the repositories and a target installation folder:
virtualenv pyarrow
@@ -337,14 +341,11 @@ includes all the dependencies for Arrow and the Apache Parquet C++ libraries.
conda create -n arrow-dev cmake git boost-cpp ^
- flatbuffers snappy zlib brotli thrift-cpp rapidjson
-activate arrow-dev
-
As one git housekeeping item, we must run this command in our Arrow clone:
-cd arrow
-git config core.symlinks true
+conda create -y -q -n pyarrow-dev ^
+ python=3.6 numpy six setuptools cython pandas pytest ^
+ cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib ^
+ gflags brotli lz4-c zstd -c conda-forge
+activate pyarrow-dev
Now, we build and install Arrow C++ libraries
@@ -356,7 +357,7 @@ cmake -G "Visual Studio 14 2015 Win64" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DCMAKE_BUILD_TYPE=Release ^
-DARROW_BUILD_TESTS=on ^
- -DARROW_CXXFLAGS="/WX" ^
+ -DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_PYTHON=on ..
cmake --build . --target INSTALL --config Release
cd ..\..
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/filesystems.html
----------------------------------------------------------------------
diff --git a/docs/python/filesystems.html b/docs/python/filesystems.html
index 97edd06..e3f564a 100644
--- a/docs/python/filesystems.html
+++ b/docs/python/filesystems.html
@@ -26,6 +26,7 @@
+
@@ -107,6 +108,10 @@
+
-hdfs.connect
-
+hdfs.connect
([host, port, user, …])
+Connect to an HDFS cluster.
-HadoopFileSystem.cat
-
+HadoopFileSystem.cat
(path)
+Return contents of file as a bytes object
-HadoopFileSystem.chmod
-
+HadoopFileSystem.chmod
(self, path, mode)
+Change file permissions
-HadoopFileSystem.chown
-
+HadoopFileSystem.chown
(self, path[, owner, …])
+Change file permissions
-HadoopFileSystem.delete
-
+HadoopFileSystem.delete
(path[, recursive])
+Delete the indicated file or directory
-HadoopFileSystem.df
-
+HadoopFileSystem.df
(self)
+Return free space on disk, like the UNIX df command
-HadoopFileSystem.disk_usage
-
+HadoopFileSystem.disk_usage
(path)
+Compute bytes used by all contents under indicated path in file tree
-HadoopFileSystem.download
+HadoopFileSystem.download
(self, path, stream)
-HadoopFileSystem.exists
+HadoopFileSystem.exists
(path)
-HadoopFileSystem.get_capacity
-
+HadoopFileSystem.get_capacity
(self)
+Get reported total capacity of file system
-HadoopFileSystem.get_space_used
-
+HadoopFileSystem.get_space_used
(self)
+Get space used on file system
-HadoopFileSystem.info
-
+HadoopFileSystem.info
(self, path)
+Return detailed HDFS information for path
-HadoopFileSystem.ls
-
+HadoopFileSystem.ls
(path[, detail])
+Retrieve directory contents and metadata, if requested.
-HadoopFileSystem.mkdir
-
+HadoopFileSystem.mkdir
(path, **kwargs)
+Create directory in HDFS
-HadoopFileSystem.open
-
+HadoopFileSystem.open
(self, path[, mode, …])
+Open HDFS file for reading or writing
-HadoopFileSystem.rename
-
+HadoopFileSystem.rename
(path, new_path)
+Rename file, like UNIX mv command
-HadoopFileSystem.rm
-
+HadoopFileSystem.rm
(path[, recursive])
+Alias for FileSystem.delete
-HadoopFileSystem.upload
-
+HadoopFileSystem.upload
(self, path, stream)
+Upload file-like object to HDFS path
-HdfsFile
+HdfsFile
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.Array.html
----------------------------------------------------------------------
diff --git a/docs/python/generated/pyarrow.Array.html b/docs/python/generated/pyarrow.Array.html
index ff5b097..7112287 100644
--- a/docs/python/generated/pyarrow.Array.html
+++ b/docs/python/generated/pyarrow.Array.html
@@ -5,7 +5,7 @@
- pyarrow.Array — pyarrow documentation
+ pyarrow.array — pyarrow documentation
@@ -30,8 +30,8 @@
-
-
+
+
@@ -91,7 +91,7 @@
@@ -101,11 +101,11 @@
-
+
-
+
@@ -115,7 +115,7 @@
- Source
@@ -140,175 +140,35 @@
-pyarrow.Array¶
-
--
-class
pyarrow.
Array
¶
-Bases: object
-
--
-
__init__
()¶
-Initialize self. See help(type(self)) for accurate signature.
-
-
-Methods
-
-
-
-
-
-
-equals
(self, Array other)
-
-
-from_pandas
(obj[, mask, timestamps_to_ms])
-Convert pandas.Series to an Arrow Array.
-
-isnull
(self)
-
-
-slice
(self[, offset, length])
-Compute zero-copy slice of this array
-
-to_pandas
(self)
-Convert to an array object suitable for use in pandas
-
-to_pylist
(self)
-Convert to an list of native Python objects.
-
-
-
-Attributes
-
-
-
-
-
-
-null_count
-
-
-type
-
-
-
-
-
--
-
equals
(self, Array other)¶
-
-
-
--
-static
from_pandas
(obj, mask=None, DataType type=None, timestamps_to_ms=False, MemoryPool memory_pool=None)¶
-Convert pandas.Series to an Arrow Array.
-
-
-
-
-Parameters:
-- series (pandas.Series or numpy.ndarray) –
-- mask (pandas.Series or numpy.ndarray, optional) – boolean mask if the object is null (True) or valid (False)
-- type (pyarrow.DataType) – Explicit type to attempt to coerce to
-- timestamps_to_ms (bool, optional) – Convert datetime columns to ms resolution. This is needed for
-compatibility with other functionality like Parquet I/O which
-only supports milliseconds.
-- memory_pool (MemoryPool, optional) – Specific memory pool to use to allocate the resulting Arrow array.
-
-
-
-
-
-Notes
-Localized timestamps will currently be returned as UTC (pandas’s native
-representation). Timezone-naive data will be implicitly interpreted as
-UTC.
-Examples
->>> import pandas as pd
->>> import pyarrow as pa
->>> pa.Array.from_pandas(pd.Series([1, 2]))
-<pyarrow.array.Int64Array object at 0x7f674e4c0e10>
-[
- 1,
- 2
-]
-
-
->>> import numpy as np
->>> pa.Array.from_pandas(pd.Series([1, 2]), np.array([0, 1],
-... dtype=bool))
-<pyarrow.array.Int64Array object at 0x7f9019e11208>
-[
- 1,
- NA
-]
-
-
-
-
-
-
-Returns: pyarrow.array.Array
-
-
-
-
-
-
--
-
isnull
(self)¶
-
-
-
--
-
null_count
¶
-
-
-
--
-
slice
(self, offset=0, length=None)¶
-Compute zero-copy slice of this array
+pyarrow.array¶
+
+-
+
pyarrow.
array
(sequence, DataType type=None, MemoryPool memory_pool=None, size=None)¶
+Create pyarrow.Array instance from a Python sequence
Parameters:
-- offset (int, default 0) – Offset from start of array to slice
-- length (int, default None) – Length of slice (default is until end of Array starting from
-offset)
+- sequence (sequence-like or iterable object of Python objects.) – If both type and size are specified may be a single use iterable.
+- type (pyarrow.DataType, optional) – If not passed, will be inferred from the data
+- memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default
+memory pool
+- size (int64, optional) – Size of the elements. If the imput is larger than size bail at this
+length. For iterators, if size is larger than the input iterator this
+will be treated as a “max size”, but will involve an initial allocation
+of size followed by a resize to the actual size (so if you know the
+exact size specifying it correctly will give you better performance).
-Returns: sliced (RecordBatch)
+Returns: array (pyarrow.Array)
-
--
-
to_pandas
(self)¶
-Convert to an array object suitable for use in pandas
-
-See also
-Column.to_pandas()
, Table.to_pandas()
, RecordBatch.to_pandas()
-
-
-
-
--
-
to_pylist
(self)¶
-Convert to an list of native Python objects.
-
-
-
--
-
type
¶
-
-
-
-
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.ArrayValue.html
----------------------------------------------------------------------
diff --git a/docs/python/generated/pyarrow.ArrayValue.html b/docs/python/generated/pyarrow.ArrayValue.html
index 057a41d..79964bb 100644
--- a/docs/python/generated/pyarrow.ArrayValue.html
+++ b/docs/python/generated/pyarrow.ArrayValue.html
@@ -73,6 +73,7 @@
The Plasma In-Memory Object Store
Using PyArrow with pandas
Reading and Writing the Apache Parquet Format
+Building C++ and Cython Extensions using pyarrow
API Reference
Getting Involved
@@ -165,7 +166,7 @@
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.BinaryValue.html
----------------------------------------------------------------------
diff --git a/docs/python/generated/pyarrow.BinaryValue.html b/docs/python/generated/pyarrow.BinaryValue.html
index b823a0f..02d6630 100644
--- a/docs/python/generated/pyarrow.BinaryValue.html
+++ b/docs/python/generated/pyarrow.BinaryValue.html
@@ -73,6 +73,7 @@
The Plasma In-Memory Object Store
Using PyArrow with pandas
Reading and Writing the Apache Parquet Format
+Building C++ and Cython Extensions using pyarrow
API Reference
Getting Involved
@@ -182,7 +183,7 @@
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
chunks
¶iterchunks
(self)¶
© Copyright 2016-2017 Apache Software Foundation.
- Created using Sphinx 1.6.4.
+ Created using Sphinx 1.6.5.
equals
(self, Column other)cast
(self, target_type[, safe])equals
(self, Column other)from_array
(field_or_name, Array arr)from_array
(*args)length
(self)length
(self)to_pandas
(self[, strings_to_categorical])to_pandas
(self[, strings_to_categorical, …])to_pylist
(self)to_pylist
(self)cast
(self, target_type, safe=True)¶Cast column values to another data type
+Parameters: |
|
+
---|---|
Returns: | casted (Column) + |
+
data
¶field
¶from_array
(field_or_name, Array arr)¶from_array
(*args)¶
to_pandas
(self, strings_to_categorical=False)¶to_pandas
(self, strings_to_categorical=False, zero_copy_only=False)¶
Convert the arrow::Column to a pandas.Series