pyarrow.Array — pyarrow documentation

conda create -y -q -n pyarrow-dev \ python=3.6 numpy six setuptools cython pandas pytest \ cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \ - brotli jemalloc lz4-c zstd -c conda-forge + gflags brotli jemalloc lz4-c zstd -c conda-forge source activate pyarrow-dev

cd arrow -git config core.symlinks true +

conda create -y -q -n pyarrow-dev ^
+      python=3.6 numpy six setuptools cython pandas pytest ^
+      cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib ^
+      gflags brotli lz4-c zstd -c conda-forge
+activate pyarrow-dev
 

Now, we build and install Arrow C++ libraries

@@ -356,7 +357,7 @@ cmake -G "Visual Studio 14 2015 Win64" ^ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^ -DCMAKE_BUILD_TYPE=Release ^ -DARROW_BUILD_TESTS=on ^ - -DARROW_CXXFLAGS="/WX" ^ + -DARROW_CXXFLAGS="/WX /MP" ^ -DARROW_PYTHON=on .. cmake --build . --target INSTALL --config Release cd ..\.. http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/filesystems.html ---------------------------------------------------------------------- diff --git a/docs/python/filesystems.html b/docs/python/filesystems.html index 97edd06..e3f564a 100644 --- a/docs/python/filesystems.html +++ b/docs/python/filesystems.html @@ -26,6 +26,7 @@ + @@ -107,6 +108,10 @@ « Streaming, Se... +

+ pyarrow.hdfs.connect » + +

@@ -181,61 +186,61 @@ installed somewhere other than -hdfs.connect - +hdfs.connect([host, port, user, …]) +Connect to an HDFS cluster. -HadoopFileSystem.cat - +HadoopFileSystem.cat(path) +Return contents of file as a bytes object -HadoopFileSystem.chmod - +HadoopFileSystem.chmod(self, path, mode) +Change file permissions -HadoopFileSystem.chown - +HadoopFileSystem.chown(self, path[, owner, …]) +Change file permissions -HadoopFileSystem.delete - +HadoopFileSystem.delete(path[, recursive]) +Delete the indicated file or directory -HadoopFileSystem.df - +HadoopFileSystem.df(self) +Return free space on disk, like the UNIX df command -HadoopFileSystem.disk_usage - +HadoopFileSystem.disk_usage(path) +Compute bytes used by all contents under indicated path in file tree -HadoopFileSystem.download +HadoopFileSystem.download(self, path, stream) -HadoopFileSystem.exists +HadoopFileSystem.exists(path) -HadoopFileSystem.get_capacity - +HadoopFileSystem.get_capacity(self) +Get reported total capacity of file system -HadoopFileSystem.get_space_used - +HadoopFileSystem.get_space_used(self) +Get space used on file system -HadoopFileSystem.info - +HadoopFileSystem.info(self, path) +Return detailed HDFS information for path -HadoopFileSystem.ls - +HadoopFileSystem.ls(path[, detail]) +Retrieve directory contents and metadata, if requested. -HadoopFileSystem.mkdir - +HadoopFileSystem.mkdir(path, **kwargs) +Create directory in HDFS -HadoopFileSystem.open - +HadoopFileSystem.open(self, path[, mode, …]) +Open HDFS file for reading or writing -HadoopFileSystem.rename - +HadoopFileSystem.rename(path, new_path) +Rename file, like UNIX mv command -HadoopFileSystem.rm - +HadoopFileSystem.rm(path[, recursive]) +Alias for FileSystem.delete -HadoopFileSystem.upload - +HadoopFileSystem.upload(self, path, stream) +Upload file-like object to HDFS path -HdfsFile +HdfsFile http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.Array.html ---------------------------------------------------------------------- diff --git a/docs/python/generated/pyarrow.Array.html b/docs/python/generated/pyarrow.Array.html index ff5b097..7112287 100644 --- a/docs/python/generated/pyarrow.Array.html +++ b/docs/python/generated/pyarrow.Array.html @@ -5,7 +5,7 @@ - pyarrow.Array — pyarrow documentation + pyarrow.array — pyarrow documentation @@ -30,8 +30,8 @@ - - + + @@ -91,7 +91,7 @@ -pyarrow.Array +pyarrow.array @@ -101,11 +101,11 @@ - « pyarrow.array + « pyarrow.DecimalValue - pyarrow.BooleanArray » + pyarrow.Array » @@ -115,7 +115,7 @@ - Source @@ -140,175 +140,35 @@ -pyarrow.Array¶ - - -class pyarrow.Array¶ -Bases: object - - -__init__()¶ -Initialize self. See help(type(self)) for accurate signature. - - -Methods - ---- - - - - - - - - - - - - - - - - - - - - - equals(self, Array other) from_pandas(obj[, mask, timestamps_to_ms]) Convert pandas.Series to an Arrow Array. isnull(self) slice(self[, offset, length]) Compute zero-copy slice of this array to_pandas(self) Convert to an array object suitable for use in pandas to_pylist(self) Convert to an list of native Python objects. -Attributes - ---- - - - - - - - - - null_count type - - -equals(self, Array other)¶ - - - - -static from_pandas(obj, mask=None, DataType type=None, timestamps_to_ms=False, MemoryPool memory_pool=None)¶ -Convert pandas.Series to an Arrow Array. - --- - - - - Parameters: -series (pandas.Series or numpy.ndarray) – -mask (pandas.Series or numpy.ndarray, optional) – boolean mask if the object is null (True) or valid (False) -type (pyarrow.DataType) – Explicit type to attempt to coerce to -timestamps_to_ms (bool, optional) – Convert datetime columns to ms resolution. This is needed for -compatibility with other functionality like Parquet I/O which -only supports milliseconds. -memory_pool (MemoryPool, optional) – Specific memory pool to use to allocate the resulting Arrow array. - - -Notes -Localized timestamps will currently be returned as UTC (pandas’s native -representation). Timezone-naive data will be implicitly interpreted as -UTC. -Examples ->>> import pandas as pd ->>> import pyarrow as pa ->>> pa.Array.from_pandas(pd.Series([1, 2])) -<pyarrow.array.Int64Array object at 0x7f674e4c0e10> -[ - 1, - 2 -] - - ->>> import numpy as np ->>> pa.Array.from_pandas(pd.Series([1, 2]), np.array([0, 1], -... dtype=bool)) -<pyarrow.array.Int64Array object at 0x7f9019e11208> -[ - 1, - NA -] - - - --- - - - - Returns: pyarrow.array.Array - - - - -isnull(self)¶ - - - - -null_count¶ - - - - -slice(self, offset=0, length=None)¶ -Compute zero-copy slice of this array +pyarrow.array¶ + + +pyarrow.array(sequence, DataType type=None, MemoryPool memory_pool=None, size=None)¶ +Create pyarrow.Array instance from a Python sequence - Parameters: -offset (int, default 0) – Offset from start of array to slice -length (int, default None) – Length of slice (default is until end of Array starting from -offset) +sequence (sequence-like or iterable object of Python objects.) – If both type and size are specified may be a single use iterable. +type (pyarrow.DataType, optional) – If not passed, will be inferred from the data +memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default +memory pool +size (int64, optional) – Size of the elements. If the imput is larger than size bail at this +length. For iterators, if size is larger than the input iterator this +will be treated as a “max size”, but will involve an initial allocation +of size followed by a resize to the actual size (so if you know the +exact size specifying it correctly will give you better performance). Returns: sliced (RecordBatch) + Returns: array (pyarrow.Array) - - -to_pandas(self)¶ -Convert to an array object suitable for use in pandas - -See also -Column.to_pandas(), Table.to_pandas(), RecordBatch.to_pandas() - - - - - -to_pylist(self)¶ -Convert to an list of native Python objects. - - - - -type¶ - - - - http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.ArrayValue.html ---------------------------------------------------------------------- diff --git a/docs/python/generated/pyarrow.ArrayValue.html b/docs/python/generated/pyarrow.ArrayValue.html index 057a41d..79964bb 100644 --- a/docs/python/generated/pyarrow.ArrayValue.html +++ b/docs/python/generated/pyarrow.ArrayValue.html @@ -73,6 +73,7 @@ The Plasma In-Memory Object Store Using PyArrow with pandas Reading and Writing the Apache Parquet Format +Building C++ and Cython Extensions using pyarrow API Reference Getting Involved @@ -165,7 +166,7 @@ © Copyright 2016-2017 Apache Software Foundation. - Created using Sphinx 1.6.4. + Created using Sphinx 1.6.5. http://git-wip-us.apache.org/repos/asf/arrow-site/blob/35611f84/docs/python/generated/pyarrow.BinaryValue.html ---------------------------------------------------------------------- diff --git a/docs/python/generated/pyarrow.BinaryValue.html b/docs/python/generated/pyarrow.BinaryValue.html index b823a0f..02d6630 100644 --- a/docs/python/generated/pyarrow.BinaryValue.html +++ b/docs/python/generated/pyarrow.BinaryValue.html @@ -73,6 +73,7 @@ The Plasma In-Memory Object Store Using PyArrow with pandas Reading and Writing the Apache Parquet Format +Building C++ and Cython Extensions using pyarrow API Reference Getting Involved @@ -182,7 +183,7 @@ © Copyright 2016-2017 Apache Software Foundation. - Created using Sphinx 1.6.4. + Created using Sphinx 1.6.5.

Parameters:	- series (pandas.Series or numpy.ndarray) – - mask (pandas.Series or numpy.ndarray, optional) – boolean mask if the object is null (True) or valid (False) - type (pyarrow.DataType) – Explicit type to attempt to coerce to - timestamps_to_ms (bool, optional) – Convert datetime columns to ms resolution. This is needed for -compatibility with other functionality like Parquet I/O which -only supports milliseconds. - memory_pool (MemoryPool, optional) – Specific memory pool to use to allocate the resulting Arrow array. - -

Returns:	pyarrow.array.Array

Parameters:	- offset (int, default 0) – Offset from start of array to slice - length (int, default None) – Length of slice (default is until end of Array starting from -offset) + sequence (sequence-like or iterable object of Python objects.) – If both type and size are specified may be a single use iterable. + type (pyarrow.DataType, optional) – If not passed, will be inferred from the data + memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default +memory pool + size (int64, optional) – Size of the elements. If the imput is larger than size bail at this +length. For iterators, if size is larger than the input iterator this +will be treated as a “max size”, but will involve an initial allocation +of size followed by a resize to the actual size (so if you know the +exact size specifying it correctly will give you better performance).
Returns:	sliced (RecordBatch) +
Returns:	array (pyarrow.Array)

Parameters:	+ target_type (DataType) – Type to cast to + safe (boolean, default True) – Check for overflows or other unsafe conversions + +
Returns:	casted (Column) +

Parameters:

target_type (DataType) – Type to cast to
safe (boolean, default True) – Check for overflows or other unsafe conversions

Returns:

casted (Column)

pyarrow.field¶

+pyarrow.field(name, type, bool nullable=True, dict metadata=None)¶

Create a pyarrow.Field instance

- - - + - -

Parameters:	metadata (dict) – Keys and values must be string-like / coercible to bytes
Returns:	field (pyarrow.Field)
Parameters:	+ name (string or bytes) – + type (pyarrow.DataType) – + nullable (boolean, default True) – + metadata (dict, default None) – Keys and values must be coercible to bytes + +

-equals(self, Field other)¶

Test if this field is equal to the other

-metadata¶

-name¶

-nullable¶

-remove_metadata(self)¶

Create new field without metadata, if any

- --- - +

Returns:	field (pyarrow.Field)
Returns:	field (pyarrow.Field) +

-type¶

`equals`(self, Array other)
`from_pandas`(obj[, mask, timestamps_to_ms])	Convert pandas.Series to an Arrow Array.
`isnull`(self)
`slice`(self[, offset, length])	Compute zero-copy slice of this array
`to_pandas`(self)	Convert to an array object suitable for use in pandas
`to_pylist`(self)	Convert to an list of native Python objects.

`add_metadata`(self, dict metadata)	Add metadata as dict of string keys and values to Field
`equals`(self, Field other)	Test if this field is equal to the other
`remove_metadata`(self)	Create new field without metadata, if any

pyarrow.Array¶

pyarrow.array¶

pyarrow.Field¶

pyarrow.field¶