arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From u..@apache.org
Subject arrow git commit: ARROW-923: Changelog generation Python script, add 0.1.0 and 0.2.0 changelog
Date Fri, 05 May 2017 06:18:58 GMT
Repository: arrow
Updated Branches:
  refs/heads/master 928b63f40 -> 2c3e111d4


ARROW-923: Changelog generation Python script, add 0.1.0 and 0.2.0 changelog

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #640 from wesm/ARROW-923 and squashes the following commits:

289d3cd [Wes McKinney] Add license header
96f55f8 [Wes McKinney] Add option to write Markdown JIRA links (for website)
6c808da [Wes McKinney] Changelog Python script, add 0.1.0 and 0.2.0 changelog


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/2c3e111d
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/2c3e111d
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/2c3e111d

Branch: refs/heads/master
Commit: 2c3e111d45c056d429cef312533c9f3f96b08ae8
Parents: 928b63f
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Fri May 5 08:18:53 2017 +0200
Committer: Uwe L. Korn <uwelk@xhochy.com>
Committed: Fri May 5 08:18:53 2017 +0200

----------------------------------------------------------------------
 CHANGELOG.md          | 403 +++++++++++++++++++++++++++++++++++++++++++++
 dev/make_changelog.py |  85 ++++++++++
 2 files changed, 488 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/2c3e111d/CHANGELOG.md
----------------------------------------------------------------------
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..3d54838
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,403 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Apache Arrow 0.2.0 (15 February 2017)
+
+## Bug
+
+* ARROW-112 - [C++]  Style fix for constants/enums
+* ARROW-202 - [C++] Integrate with appveyor ci for windows support and get arrow building
on windows
+* ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux
ABI compatibility
+* ARROW-224 - [C++] Address static linking of boost dependencies
+* ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io)
+* ARROW-239 - [Python] HdfsFile.read called with no arguments should read remainder of file
+* ARROW-261 - [C++] Refactor BinaryArray/StringArray classes to not inherit from ListArray
+* ARROW-275 - Add tests for UnionVector in Arrow File
+* ARROW-294 - [C++] Do not use fopen / fclose / etc. methods for memory mapped file implementation
+* ARROW-322 - [C++] Do not build HDFS IO interface optionally
+* ARROW-323 - [Python] Opt-in to PyArrow parquet build rather than skipping silently on failure
+* ARROW-334 - [Python] OS X rpath issues on some configurations
+* ARROW-337 - UnionListWriter.list() is doing more than it should, this can cause data corruption
+* ARROW-339 - Make merge_arrow_pr script work with Python 3
+* ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate
to zero
+* ARROW-342 - Set Python version on release
+* ARROW-345 - libhdfs integration doesn't work for Mac
+* ARROW-346 - Python API Documentation
+* ARROW-348 - [Python] CMake build type should be configurable on the command line
+* ARROW-349 - Six is missing as a requirement in the python setup.py
+* ARROW-351 - Time type has no unit
+* ARROW-354 - Connot compare an array of empty strings to another
+* ARROW-357 - Default Parquet chunk_size of 64k is too small
+* ARROW-358 - [C++] libhdfs can be in non-standard locations in some Hadoop distributions
+* ARROW-362 - Python: Calling to_pandas on a table read from Parquet leaks memory
+* ARROW-371 - Python: Table with null timestamp becomes float in pandas
+* ARROW-375 - columns parameter in parquet.read_table() raises KeyError for valid column
+* ARROW-384 - Align Java and C++ RecordBatch data and metadata layout
+* ARROW-386 - [Java] Respect case of struct / map field names
+* ARROW-387 - [C++] arrow::io::BufferReader does not permit shared memory ownership in zero-copy
reads
+* ARROW-390 - C++: CMake fails on json-integration-test with ARROW_BUILD_TESTS=OFF
+* ARROW-392 - Fix string/binary integration tests
+* ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector
+* ARROW-395 - Arrow file format writes record batches in reverse order.
+* ARROW-398 - [Java] Java file format requires bitmaps of all 1's to be written when there
are no nulls
+* ARROW-399 - [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata
+* ARROW-400 - [Java] ArrowWriter writes length 0 for Struct types
+* ARROW-401 - [Java] Floating point vectors should do an approximate comparison in integration
tests
+* ARROW-402 - [Java] "refCnt gone negative" error in integration tests
+* ARROW-403 - [JAVA] UnionVector: Creating a transfer pair doesn't transfer the schema to
destination vector
+* ARROW-404 - [Python] Closing an HdfsClient while there are still open file handles results
in a crash
+* ARROW-405 - [C++] Be less stringent about finding include/hdfs.h in HADOOP_HOME
+* ARROW-406 - [C++] Large HDFS reads must utilize the set file buffer size when making RPCs
+* ARROW-408 - [C++/Python] Remove defunct conda recipes
+* ARROW-414 - [Java] "Buffer too large to resize to ..." error
+* ARROW-420 - Align Date implementation between Java and C++
+* ARROW-421 - [Python] Zero-copy buffers read by pyarrow::PyBytesReader must retain a reference
to the parent PyBytes to avoid premature garbage collection issues
+* ARROW-422 - C++: IPC should depend on rapidjson_ep if RapidJSON is vendored
+* ARROW-429 - git-archive SHA-256 checksums are changing
+* ARROW-433 - [Python] Date conversion is locale-dependent
+* ARROW-434 - Segfaults and encoding issues in Python Parquet reads
+* ARROW-435 - C++: Spelling mistake in if(RAPIDJSON_VENDORED)
+* ARROW-437 - [C++] clang compiler warnings from overridden virtual functions
+* ARROW-445 - C++: arrow_ipc is built before arrow/ipc/Message_generated.h was generated
+* ARROW-447 - Python: Align scalar/pylist string encoding with pandas' one.
+* ARROW-455 - [C++] BufferOutputStream dtor does not call Close()
+* ARROW-469 - C++: Add option so that resize doesn't decrease the capacity
+* ARROW-481 - [Python] Fix Python 2.7 regression in patch for PARQUET-472
+* ARROW-486 - [C++] arrow::io::MemoryMappedFile can't be casted to arrow::io::FileInterface
+* ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails
+* ARROW-494 - [C++] When MemoryMappedFile is destructed, memory is unmapped even if buffer
referecnes still exist
+* ARROW-499 - Update file serialization to use streaming serialization format
+* ARROW-505 - [C++] Fix compiler warnings in release mode
+* ARROW-511 - [Python] List[T] conversions not implemented for single arrays
+* ARROW-513 - [C++] Fix Appveyor build
+* ARROW-519 - [C++] Missing vtable in libarrow.dylib on Xcode 6.4
+* ARROW-523 - Python: Account for changes in PARQUET-834
+* ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor
+* ARROW-535 - [Python] Add type mapping for NPY_LONGLONG
+* ARROW-537 - [C++] StringArray/BinaryArray comparisons may be incorrect when values with
non-zero length are null
+* ARROW-540 - [C++] Fix build in aftermath of ARROW-33
+* ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries
+* ARROW-544 - [C++] ArrayLoader::LoadBinary fails for length-0 arrays
+* ARROW-545 - [Python] Ignore files without .parq or .parquet prefix when reading directory
of files
+* ARROW-548 - [Python] Add nthreads option to pyarrow.Filesystem.read_parquet
+* ARROW-551 - C++: Construction of Column with nullptr Array segfaults
+* ARROW-556 - [Integration] Can not run Integration tests if different cpp build path
+* ARROW-561 - Update java & python dependencies to improve downstream packaging experience
+
+## Improvement
+
+* ARROW-189 - C++: Use ExternalProject to build thirdparty dependencies
+* ARROW-191 - Python: Provide infrastructure for manylinux1 wheels
+* ARROW-328 - [C++] Return shared_ptr by value instead of const-ref?
+* ARROW-330 - [C++] CMake functions to simplify shared / static library configuration
+* ARROW-333 - Make writers update their internal schema even when no data is written.
+* ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better
+* ARROW-336 - Run Apache Rat in Travis builds
+* ARROW-338 - [C++] Refactor IPC vector "loading" and "unloading" to be based on cleaner
visitor pattern
+* ARROW-350 - Add Kerberos support to HDFS shim
+* ARROW-355 - Add tests for serialising arrays of empty strings to Parquet
+* ARROW-356 - Add documentation about reading Parquet
+* ARROW-360 - C++: Add method to shrink PoolBuffer using realloc
+* ARROW-361 - Python: Support reading a column-selection from Parquet files
+* ARROW-365 - Python: Provide Array.to_pandas()
+* ARROW-366 - [java] implement Dictionary vector
+* ARROW-374 - Python: clarify unicode vs. binary in API
+* ARROW-379 - Python: Use setuptools_scm/setuptools_scm_git_archive to provide the version
number
+* ARROW-380 - [Java] optimize null count when serializing vectors.
+* ARROW-382 - Python: Extend API documentation
+* ARROW-396 - Python: Add pyarrow.schema.Schema.equals
+* ARROW-409 - Python: Change pyarrow.Table.dataframe_from_batches API to create Table instead
+* ARROW-411 - [Java] Move Intergration.compare and Intergration.compareSchemas to a public
utils class
+* ARROW-423 - C++: Define BUILD_BYPRODUCTS in external project to support non-make CMake
generators
+* ARROW-425 - Python: Expose a C function to convert arrow::Table to pyarrow.Table
+* ARROW-426 - Python: Conversion from pyarrow.Array to a Python list
+* ARROW-430 - Python: Better version handling
+* ARROW-432 - [Python] Avoid unnecessary memory copy in to_pandas conversion by using low-level
pandas internals APIs
+* ARROW-450 - Python: Fixes for PARQUET-818
+* ARROW-457 - Python: Better control over memory pool
+* ARROW-458 - Python: Expose jemalloc MemoryPool
+* ARROW-463 - C++: Support jemalloc 4.x
+* ARROW-466 - C++: ExternalProject for jemalloc
+* ARROW-468 - Python: Conversion of nested data in pd.DataFrames to/from Arrow structures
+* ARROW-474 - Create an Arrow streaming file fomat
+* ARROW-479 - Python: Test for expected schema in Pandas conversion
+* ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector
before calling VariableLengthVectors.mutator.getSafe
+* ARROW-490 - Python: Update manylinux1 build scripts
+* ARROW-524 - [java] provide apis to access nested vectors and buffers
+* ARROW-525 - Python: Add more documentation to the package
+* ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build
+* ARROW-546 - Python: Account for changes in PARQUET-867
+* ARROW-553 - C++: Faster valid bitmap building
+
+## New Feature
+
+* ARROW-108 - [C++] Add IPC round trip for union types
+* ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types
+* ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet
+* ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python
file-like objects
+* ARROW-243 - [C++] Add "driver" option to HdfsClient to choose between libhdfs and libhdfs3
at runtime
+* ARROW-303 - [C++] Also build static libraries for leaf libraries
+* ARROW-312 - [Python] Provide Python API to read/write the Arrow IPC file format
+* ARROW-317 - [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference
to parent
+* ARROW-33 - C++: Implement zero-copy array slicing
+* ARROW-332 - [Python] Add helper function to convert RecordBatch to pandas.DataFrame
+* ARROW-363 - Set up Java/C++ integration test harness
+* ARROW-369 - [Python] Add ability to convert multiple record batches at once to pandas
+* ARROW-373 - [C++] Implement C++ version of JSON file format for testing
+* ARROW-377 - Python: Add support for conversion of Pandas.Categorical
+* ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton
+* ARROW-383 - [C++] Implement C++ version of ARROW-367 integration test validator
+* ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects
+* ARROW-394 - Add integration tests for boolean, list, struct, and other basic types
+* ARROW-410 - [C++] Add Flush method to arrow::io::OutputStream
+* ARROW-415 - C++: Add Equals implementation to compare Tables
+* ARROW-416 - C++: Add Equals implementation to compare Columns
+* ARROW-417 - C++: Add Equals implementation to compare ChunkedArrays
+* ARROW-418 - [C++] Consolidate array container and builder code, remove arrow/types
+* ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/
source directory
+* ARROW-427 - [C++] Implement dictionary-encoded array container
+* ARROW-428 - [Python] Deserialize from Arrow record batches to pandas in parallel using
a thread pool
+* ARROW-438 - [Python] Concatenate Table instances with equal schemas
+* ARROW-440 - [C++] Support pkg-config
+* ARROW-441 - [Python] Expose Arrow's file and memory map classes as NativeFile subclasses
+* ARROW-442 - [Python] Add public Python API to inspect Parquet file metadata
+* ARROW-444 - [Python] Avoid unnecessary memory copies from use of PyBytes_* C APIs
+* ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict
+* ARROW-456 - C++: Add jemalloc based MemoryPool
+* ARROW-461 - [Python] Implement conversion between arrow::DictionaryArray and pandas.Categorical
+* ARROW-467 - [Python] Run parquet-cpp unit tests in Travis CI
+* ARROW-470 - [Python] Add "FileSystem" abstraction to access directories of files in a uniform
way
+* ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata
+* ARROW-472 - [Python] Expose parquet::{SchemaDescriptor, ColumnDescriptor}::Equals
+* ARROW-475 - [Python] High level support for reading directories of Parquet files (as a
single Arrow table) from supported file system interfaces
+* ARROW-476 - [Integration] Add integration tests for Binary / Varbytes type
+* ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and
in IPC/JSON layer
+* ARROW-478 - [Python] Accept a PyBytes object in the pyarrow.io.BufferReader ctor
+* ARROW-484 - Add more detail about what of technology can be found in the Arrow implementations
to README
+* ARROW-495 - [C++] Add C++ implementation of streaming serialized format
+* ARROW-497 - [Java] Integration test harness for streaming format
+* ARROW-498 - [C++] Integration test harness for streaming format
+* ARROW-503 - [Python] Interface to streaming binary format
+* ARROW-508 - [C++] Make file/memory-mapped file interfaces threadsafe
+* ARROW-509 - [Python] Add support for PARQUET-835 (parallel column reads)
+* ARROW-512 - C++: Add method to check for primitive types
+* ARROW-514 - [Python] Accept pyarrow.io.Buffer as input to StreamReader, FileReader classes
+* ARROW-515 - [Python] Add StreamReader/FileReader methods that read all record batches as
a Table
+* ARROW-521 - [C++/Python] Track peak memory use in default MemoryPool
+* ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved
+* ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds
+* ARROW-547 - [Python] Expose Array::Slice and RecordBatch::Slice
+* ARROW-81 - [Format] Add a Category logical type (distinct from dictionary-encoding)
+
+## Task
+
+* ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC
+* ARROW-327 - [Python] Remove conda builds from Travis CI processes
+* ARROW-353 - Arrow release 0.2
+* ARROW-359 - Need to document ARROW_LIBHDFS_DIR
+* ARROW-367 - [java] converter csv/json <=> Arrow file format for Integration tests
+* ARROW-368 - Document use of LD_LIBRARY_PATH when using Python
+* ARROW-372 - Create JSON arrow file format for integration tests
+* ARROW-506 - Implement Arrow Echo server for integration testing
+* ARROW-527 - clean drill-module.conf file
+* ARROW-558 - Add KEYS files
+* ARROW-96 - C++: API documentation using Doxygen
+* ARROW-97 - Python: API documentation via sphinx-apidoc
+
+# Apache Arrow 0.1.0 (7 October 2016)
+
+## Bug
+
+* ARROW-103 - Missing patterns from .gitignore
+* ARROW-104 - Update Layout.md based on discussion on the mailing list
+* ARROW-105 - Unit tests fail if assertions are disabled
+* ARROW-113 - TestValueVector test fails if cannot allocate 2GB of memory
+* ARROW-16 - Building cpp issues on XCode 7.2.1
+* ARROW-17 - Set some vector fields to default access level for Drill compatibility
+* ARROW-18 - Fix bug with decimal precision and scale
+* ARROW-185 - [C++] Make sure alignment and memory padding conform to spec
+* ARROW-188 - Python: Add numpy as install requirement
+* ARROW-193 - For the instruction, typos "int his" should be "in this"
+* ARROW-194 - C++: Allow read-only memory mapped source
+* ARROW-200 - [Python] Convert Values String looks like it has incorrect error handling
+* ARROW-209 - [C++] Broken builds: llvm.org apt repos are unavailable
+* ARROW-210 - [C++] Tidy up the type system a little bit
+* ARROW-211 - Several typos/errors in Layout.md examples
+* ARROW-217 - Fix Travis w.r.t conda 4.1.0 changes
+* ARROW-219 - [C++] Passed CMAKE_CXX_FLAGS are being dropped, fix compiler warnings
+* ARROW-223 - Do not link against libpython
+* ARROW-225 - [C++/Python] master Travis CI build is broken
+* ARROW-244 - [C++] Some global APIs of IPC module should be visible to the outside
+* ARROW-246 - [Java] UnionVector doesn't call allocateNew() when creating it's vectorType
+* ARROW-247 - [C++] Missing explicit destructor in RowBatchReader causes an incomplete type
error
+* ARROW-250 - Fix for ARROW-246 may cause memory leaks
+* ARROW-259 - Use flatbuffer fields in java implementation
+* ARROW-265 - Negative decimal values have wrong padding
+* ARROW-266 - [C++] Fix the broken build
+* ARROW-274 - Make the MapVector nullable
+* ARROW-278 - [Format] Struct type name consistency in implementations and metadata
+* ARROW-283 - [C++] Update arrow_parquet to account for API changes in PARQUET-573
+* ARROW-284 - [C++] Triage builds by disabling Arrow-Parquet module
+* ARROW-287 - [java] Make nullable vectors use a BitVecor instead of UInt1Vector for bits
+* ARROW-297 - Fix Arrow pom for release
+* ARROW-304 - NullableMapReaderImpl.isSet() always returns true
+* ARROW-308 - UnionListWriter.setPosition() should not call startList()
+* ARROW-309 - Types.getMinorTypeForArrowType() does not work for Union type
+* ARROW-313 - XCode 8.0 breaks builds
+* ARROW-314 - JSONScalar is unnecessary and unused.
+* ARROW-320 - ComplexCopier.copy(FieldReader, FieldWriter) should not start a list if reader
is not set
+* ARROW-321 - Fix Arrow licences
+* ARROW-36 - Remove fixVersions from patch tool (until we have them)
+* ARROW-46 - Port DRILL-4410 to Arrow
+* ARROW-5 - Error when run maven install
+* ARROW-51 - Move ValueVector test from Drill project
+* ARROW-55 - Python: fix legacy Python (2.7) tests and add to Travis CI
+* ARROW-62 - Format: Are the nulls bits 0 or 1 for null values?
+* ARROW-63 - C++: ctest fails if Python 3 is the active Python interpreter
+* ARROW-65 - Python: FindPythonLibsNew does not work in a virtualenv
+* ARROW-69 - Change permissions for assignable users
+* ARROW-72 - FindParquet searches for non-existent header
+* ARROW-75 - C++: Fix handling of empty strings
+* ARROW-77 - C++: conform null bit interpretation to match ARROW-62
+* ARROW-80 - Segmentation fault on len(Array) for empty arrays
+* ARROW-88 - C++: Refactor given PARQUET-572
+* ARROW-93 - XCode 7.3 breaks builds
+* ARROW-94 - Expand list example to clarify null vs empty list
+
+## Improvement
+
+* ARROW-10 - Fix mismatch of javadoc names and method parameters
+* ARROW-15 - Fix a naming typo for memory.AllocationManager.AllocationOutcome
+* ARROW-190 - Python: Provide installable sdist builds
+* ARROW-199 - [C++] Refine third party dependency
+* ARROW-206 - [C++] Expose an equality API for arrays that compares a range of slots on two
arrays
+* ARROW-212 - [C++] Clarify the fact that PrimitiveArray is now abstract class
+* ARROW-213 - Exposing static arrow build
+* ARROW-218 - Add option to use GitHub API token via environment variable when merging PRs
+* ARROW-234 - [C++] Build with libhdfs support in arrow_io in conda builds
+* ARROW-238 - C++: InternalMemoryPool::Free() should throw an error when there is insufficient
allocated memory
+* ARROW-245 - [Format] Clarify Arrow's relationship with big endian platforms
+* ARROW-252 - Add implementation guidelines to the documentation
+* ARROW-253 - Int types should only have width of 8*2^n (8, 16, 32, 64)
+* ARROW-254 - Remove Bit type as it is redundant with boolean
+* ARROW-255 - Finalize Dictionary representation
+* ARROW-256 - Add versioning to the arrow spec.
+* ARROW-257 - Add a typeids Vector to Union type
+* ARROW-264 - Create an Arrow File format
+* ARROW-270 - [Format] Define more generic Interval logical type
+* ARROW-271 - Update Field structure to be more explicit
+* ARROW-279 - rename vector module to arrow-vector for consistency
+* ARROW-280 - [C++] Consolidate file and shared memory IO interfaces
+* ARROW-285 - Allow for custom flatc compiler
+* ARROW-286 - Build thirdparty dependencies in parallel
+* ARROW-289 - Install test-util.h
+* ARROW-290 - Specialize alloc() in ArrowBuf
+* ARROW-292 - [Java] Upgrade Netty to 4.041
+* ARROW-299 - Use absolute namespace in macros
+* ARROW-305 - Add compression and use_dictionary options to Parquet interface
+* ARROW-306 - Add option to pass cmake arguments via environment variable
+* ARROW-315 - Finalize timestamp type
+* ARROW-319 - Add canonical Arrow Schema json representation
+* ARROW-324 - Update arrow metadata diagram
+* ARROW-325 - make TestArrowFile not dependent on timezone
+* ARROW-50 - C++: Enable library builds for 3rd-party users without having to build thirdparty
googletest
+* ARROW-54 - Python: rename package to "pyarrow"
+* ARROW-64 - Add zsh support to C++ build scripts
+* ARROW-66 - Maybe some missing steps in installation guide
+* ARROW-68 - Update setup_build_env and third-party script to be more userfriendly
+* ARROW-71 - C++: Add script to run clang-tidy on codebase
+* ARROW-73 - Support CMake 2.8
+* ARROW-78 - C++: Add constructor for DecimalType
+* ARROW-79 - Python: Add benchmarks
+* ARROW-8 - Set up Travis CI
+* ARROW-85 - C++: memcmp can be avoided in Equal when comparing with the same Buffer
+* ARROW-86 - Python: Implement zero-copy Arrow-to-Pandas conversion
+* ARROW-87 - Implement Decimal schema conversion for all ways supported in Parquet
+* ARROW-89 - Python: Add benchmarks for Arrow<->Pandas conversion
+* ARROW-9 - Rename some unchanged "Drill" to "Arrow"
+* ARROW-91 - C++: First draft of an adapter class for parquet-cpp's ParquetFileReader that
produces Arrow table/row batch objects
+
+## New Feature
+
+* ARROW-100 - [C++] Computing RowBatch size
+* ARROW-106 - Add IPC round trip for string types (string, char, varchar, binary)
+* ARROW-107 - [C++] add ipc round trip for struct types
+* ARROW-13 - Add PR merge tool similar to that used in Parquet
+* ARROW-19 - C++: Externalize memory allocations and add a MemoryPool abstract interface
to builder classes
+* ARROW-197 - [Python] Add conda dev recipe for pyarrow
+* ARROW-2 - Post Simple Website
+* ARROW-20 - C++: Add null count member to Array containers, remove nullable member
+* ARROW-201 - C++: Initial ParquetWriter implementation
+* ARROW-203 - Python: Basic filename based Parquet read/write
+* ARROW-204 - [Python] Automate uploading conda build artifacts for libarrow and pyarrow
+* ARROW-21 - C++: Add in-memory schema metadata container
+* ARROW-214 - C++: Add String support to Parquet I/O
+* ARROW-215 - C++: Support other integer types in Parquet I/O
+* ARROW-22 - C++: Add schema adapter routines for converting flat Parquet schemas to in-memory
Arrow schemas
+* ARROW-222 - [C++] Create prototype file-like interface to HDFS (via libhdfs) and begin
defining more general IO interface for Arrow data adapters
+* ARROW-23 - C++: Add logical "Column" container for chunked data
+* ARROW-233 - [C++] Add visibility defines for limiting shared library symbol visibility
+* ARROW-236 - [Python] Enable Parquet read/write to work with HDFS file objects
+* ARROW-237 - [C++] Create Arrow specializations of Parquet allocator and read interfaces
+* ARROW-24 - C++: Add logical "Table" container
+* ARROW-242 - C++/Python: Support Timestamp Data Type
+* ARROW-26 - C++: Add developer instructions for building parquet-cpp integration
+* ARROW-262 - [Format] Add a new format document for metadata and logical types for messaging
and IPC / on-wire/file representations
+* ARROW-267 - [C++] C++ implementation of file-like layout for RPC / IPC
+* ARROW-28 - C++: Add google/benchmark to the 3rd-party build toolchain
+* ARROW-293 - [C++] Implementations of IO interfaces for operating system files
+* ARROW-296 - [C++] Remove arrow_parquet C++ module and related parts of build system
+* ARROW-3 - Post Initial Arrow Format Spec
+* ARROW-30 - Python: pandas/NumPy to/from Arrow conversion routines
+* ARROW-301 - [Format] Add some form of user field metadata to IPC schemas
+* ARROW-302 - [Python] Add support to use the Arrow file format with file-like objects
+* ARROW-31 - Python: basic PyList <-> Arrow marshaling code
+* ARROW-318 - [Python] Revise README to reflect current state of project
+* ARROW-37 - C++: Represent boolean array data in bit-packed form
+* ARROW-4 - Initial Arrow CPP Implementation
+* ARROW-42 - Python: Add to Travis CI build
+* ARROW-43 - Python: Add rudimentary console __repr__ for array types
+* ARROW-44 - Python: Implement basic object model for scalar values (i.e. results of arrow_arr[i])
+* ARROW-48 - Python: Add Schema object wrapper
+* ARROW-49 - Python: Add Column and Table wrapper interface
+* ARROW-53 - Python: Fix RPATH and add source installation instructions
+* ARROW-56 - Format: Specify LSB bit ordering in bit arrays
+* ARROW-57 - Format: Draft data headers IDL for data interchange
+* ARROW-58 - Format: Draft type metadata ("schemas") IDL
+* ARROW-59 - Python: Boolean data support for builtin data structures
+* ARROW-60 - C++: Struct type builder API
+* ARROW-67 - C++: Draft type metadata conversion to/from IPC representation
+* ARROW-7 - Add Python library build toolchain
+* ARROW-70 - C++: Add "lite" DCHECK macros used in parquet-cpp
+* ARROW-76 - Revise format document to include null count, defer non-nullable arrays to the
domain of metadata
+* ARROW-82 - C++: Implement IPC exchange for List types
+* ARROW-90 - Apache Arrow cpp code does not support power architecture
+* ARROW-92 - C++: Arrow to Parquet Schema conversion
+
+## Task
+
+* ARROW-1 - Import Initial Codebase
+* ARROW-101 - Fix java warnings emitted by java compiler
+* ARROW-102 - travis-ci support for java project
+* ARROW-11 - Mirror JIRA activity to dev@arrow.apache.org
+* ARROW-14 - Add JIRA components
+* ARROW-251 - [C++] Expose APIs for getting code and message of the status
+* ARROW-272 - Arrow release 0.1
+* ARROW-298 - create release scripts
+* ARROW-35 - Add a short call-to-action / how-to-get-involved to the main README.md
+
+## Test
+
+* ARROW-260 - TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation
are flaky
+* ARROW-83 - Add basic test infrastructure for DecimalType

http://git-wip-us.apache.org/repos/asf/arrow/blob/2c3e111d/dev/make_changelog.py
----------------------------------------------------------------------
diff --git a/dev/make_changelog.py b/dev/make_changelog.py
new file mode 100644
index 0000000..0ad1607
--- /dev/null
+++ b/dev/make_changelog.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Utility for generating changelogs for fix versions
+# requirements: pip install jira
+# Set $JIRA_USERNAME, $JIRA_PASSWORD environment variables
+
+from collections import defaultdict
+from io import StringIO
+import os
+import sys
+
+import jira.client
+
+# ASF JIRA username
+JIRA_USERNAME = os.environ.get("JIRA_USERNAME")
+# ASF JIRA password
+JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD")
+
+JIRA_API_BASE = "https://issues.apache.org/jira"
+
+asf_jira = jira.client.JIRA({'server': JIRA_API_BASE},
+                            basic_auth=(JIRA_USERNAME, JIRA_PASSWORD))
+
+
+def get_issues_for_version(version):
+    jql = ("project=ARROW "
+           "AND fixVersion='{0}' "
+           "AND status = Resolved "
+           "AND resolution in (Fixed, Done) "
+           "ORDER BY issuetype DESC").format(version)
+
+    return asf_jira.search_issues(jql, maxResults=9999)
+
+
+LINK_TEMPLATE = '[{0}](https://issues.apache.org/jira/browse/{0})'
+
+
+def format_changelog_markdown(issues, out, links=False):
+    issues_by_type = defaultdict(list)
+    for issue in issues:
+        issues_by_type[issue.fields.issuetype.name].append(issue)
+
+
+    for issue_type, issue_group in sorted(issues_by_type.items()):
+        issue_group.sort(key=lambda x: x.key)
+
+        out.write('## {0}\n\n'.format(issue_type))
+        for issue in issue_group:
+            if links:
+                name = LINK_TEMPLATE.format(issue.key)
+            else:
+                name = issue.key
+            out.write('* {0} - {1}\n'.format(name,
+                                             issue.fields.summary))
+        out.write('\n')
+
+
+if __name__ == '__main__':
+    if len(sys.argv) < 2:
+        print('Usage: make_changelog.py $FIX_VERSION [$LINKS]')
+
+    buf = StringIO()
+
+    links = len(sys.argv) > 2 and sys.argv[2] == '1'
+
+    issues_for_version = get_issues_for_version(sys.argv[1])
+    format_changelog_markdown(issues_for_version, buf, links=links)
+    print(buf.getvalue())


Mime
View raw message