accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ktur...@apache.org
Subject accumulo git commit: Added sampling to release notes
Date Tue, 06 Sep 2016 15:18:37 GMT
Repository: accumulo
Updated Branches:
  refs/heads/gh-pages be06c7629 -> e70549671


Added sampling to release notes


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/e7054967
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/e7054967
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/e7054967

Branch: refs/heads/gh-pages
Commit: e705496714f39f5bf1383710ba253adb695948d7
Parents: be06c76
Author: Keith Turner <kturner@apache.org>
Authored: Tue Sep 6 11:18:07 2016 -0400
Committer: Keith Turner <kturner@apache.org>
Committed: Tue Sep 6 11:18:07 2016 -0400

----------------------------------------------------------------------
 release_notes/1.8.0.md | 55 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/e7054967/release_notes/1.8.0.md
----------------------------------------------------------------------
diff --git a/release_notes/1.8.0.md b/release_notes/1.8.0.md
index b191dfb..6dfe2ad 100644
--- a/release_notes/1.8.0.md
+++ b/release_notes/1.8.0.md
@@ -47,8 +47,8 @@ default. Root tablet assignment can not be suspended. See [ACCUMULO-4353]
for mo
 
 ### Run multiple Tablet Servers on one node
 
-[ACCUMULO-4328] introduces the capability of running multiple tservers on a single node.
This intended for nodes with a large
-amount of memory. This feature is disabled by default. There are several related tickets:
[ACCUMULO-4072], [ACCUMULO-4331]
+[ACCUMULO-4328] introduces the capability of running multiple tservers on a single node.
This is intended for nodes with a large
+amounts of memory and/or disk. This feature is disabled by default. There are several related
tickets: [ACCUMULO-4072], [ACCUMULO-4331]
 and [ACCUMULO-4406]. Note that when this is enabled, the names of the log files change. Previous
log file names were defined in the
 generic_logger.xml as `${org.apache.accumulo.core.application}_{org.apache.accumulo.core.ip.localhost.hostname}.log`.
 The files will now include the instance id after the application with
@@ -60,11 +60,32 @@ names do not change if this feature is not used.
 
 ### Rate limiting Major Compactions
 
-Major Compactions can significantly increase the amount of load on TabletServers. [ACCUMULO-4187]
take a cue from Apache
+Major Compactions can significantly increase the amount of load on TabletServers. [ACCUMULO-4187]
takes a cue from Apache
 Cassandra and restricts the rate at which data is read and written when performing major
compactions. This has a direct effect
 on the IO load caused by major compactions with a similar effect on the CPU utilization.
This behavior is controlled
 by a new property `tserver.compaction.major.throughput` with a defaults of 0B which disables
the rate limiting.
 
+### Sampling
+
+Queryable sample data was added by [ACCUMULO-3913].  This allows users to configure a pluggable
+function to generate sample data.  At scan time, the sample data can optionally be scanned.
+Iterators also have access to sample data.  Iterators can access all data and sample data,
this
+allows an iterator to use sample data for query optimizations.  The new user level RFile
API
+supports writing RFiles with sample data for bulk import.
+
+A simple configurable sampler function is included with Accumulo.  This sampler uses hashing
and
+can be configured to use a subset of Key fields.  For example if it was desired to have entire
rows
+in the sample, then this sampler would be configured to hash+mod the row.   Then when a row
is
+selected for the sample, all of its columns and all of its updates will be in the sample
data.
+Another scenario is one in which a document id is in the column qualifier.  In this scenario,
one
+would either want all data related to a document in the sample data or none.  To achieve
this, the
+sample could be configured to hash+mod on the column qualifier.  See the sample [Readme
+example][sample] and javadocs on the new APIs for more information.
+
+For sampling to work, all tablets scanned must have pre-generated sample data that was generated
in
+the same way.  If this is not the case then scans will fail.  For existing tables, samples
can be
+generated by configuring sampling on the table and compacting the table.
+
 ### Upgrade to Apache Thrift 0.9.3
 
 Accumulo relies on Apache Thrift to implement remote procedure calls between Accumulo services.
@@ -74,7 +95,7 @@ on the changes to Thrift.
 ### Iterator Test Harness
 
 Users often write iterators without fully understanding its limits and lifetime. Previously,
Accumulo did
-not provide any means in which a user could test iterators to catch common issues that only
become apparant
+not provide any means in which a user could test iterators to catch common issues that only
become apparent
 in multi-node production deployments. [ACCUMULO-626] provides a framework and a collection
of initial tests
 which can be used to simulate common issues with Iterators that only appear in production
deployments. This test
 harness can be used directly by users as a supplemental tool to unit tests and integration
tests with MiniAccumuloCluster.
@@ -93,14 +114,18 @@ defaults out of the ephemeral range, we can guarantee that the Monitor
and GC wi
 
 ## Other Notable Changes
 
- * [ACCUMULO-1055][ACCUMULO-1055] Configurable maximum file size for merging minor compactions
- * [ACCUMULO-1124][ACCUMULO-1124] Optimization of RFile index
- * [ACCUMULO-2883][ACCUMULO-2883] API to fetch current tablet assignments
- * [ACCUMULO-3871][ACCUMULO-3871] Support for running integration tests in MapReduce
- * [ACCUMULO-3920][ACCUMULO-3920] Deprecate the MockAccumulo class and remove usage in our
tests
- * [ACCUMULO-4339][ACCUMULO-4339] Make hadoop-minicluster optional dependency of acccumulo-minicluster
- * [ACCUMULO-4354][ACCUMULO-4354] Bump dependency versions to include gson, jetty, and sl4j
- * [ACCUMULO-3735][ACCUMULO-3735] Bulk Import status page on the monitor
+ * [ACCUMULO-1055] Configurable maximum file size for merging minor compactions
+ * [ACCUMULO-1124] Optimization of RFile index
+ * [ACCUMULO-2883] API to fetch current tablet assignments
+ * [ACCUMULO-3871] Support for running integration tests in MapReduce
+ * [ACCUMULO-3920] Deprecate the MockAccumulo class and remove usage in our tests
+ * [ACCUMULO-4339] Make hadoop-minicluster optional dependency of acccumulo-minicluster
+ * [ACCUMULO-4318] BatchWriter, ConditionalWriter, and ScannerBase now extend AutoCloseable
+ * [ACCUMULO-4326] Value constructor now accepts Strings (and Charsequences)
+ * [ACCUMULO-4354] Bump dependency versions to include gson, jetty, and sl4j
+ * [ACCUMULO-3735] Bulk Import status page on the monitor
+ * [ACCUMULO-4066] Reduced time to processes conditional mutations.
+ * [ACCUMULO-4164] Reduced seek time for cached data.
 
 ## Testing
 
@@ -127,11 +152,16 @@ HDFS High-Availability instances, forcing NameNode failover.
 [ACCUMULO-3423]: https://issues.apache.org/jira/browse/ACCUMULO-3423
 [ACCUMULO-3735]: https://issues.apache.org/jira/browse/ACCUMULO-3735
 [ACCUMULO-3871]: https://issues.apache.org/jira/browse/ACCUMULO-3871
+[ACCUMULO-3913]: https://issues.apache.org/jira/browse/ACCUMULO-3913
 [ACCUMULO-3920]: https://issues.apache.org/jira/browse/ACCUMULO-3920
 [ACCUMULO-4072]: https://issues.apache.org/jira/browse/ACCUMULO-4072
 [ACCUMULO-4077]: https://issues.apache.org/jira/browse/ACCUMULO-4077
+[ACCUMULO-4066]: https://issues.apache.org/jira/browse/ACCUMULO-4066
+[ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164
 [ACCUMULO-4165]: https://issues.apache.org/jira/browse/ACCUMULO-4165
 [ACCUMULO-4187]: https://issues.apache.org/jira/browse/ACCUMULO-4187
+[ACCUMULO-4318]: https://issues.apache.org/jira/browse/ACCUMULO-4318
+[ACCUMULO-4326]: https://issues.apache.org/jira/browse/ACCUMULO-4326
 [ACCUMULO-4328]: https://issues.apache.org/jira/browse/ACCUMULO-4328
 [ACCUMULO-4331]: https://issues.apache.org/jira/browse/ACCUMULO-4331
 [ACCUMULO-4339]: https://issues.apache.org/jira/browse/ACCUMULO-4339
@@ -144,4 +174,5 @@ HDFS High-Availability instances, forcing NameNode failover.
 [THRIFT-0.9.3-RN]: https://github.com/apache/thrift/blob/0.9.3/CHANGES
 [api]: https://github.com/apache/accumulo/blob/1.8/README.md#api
 [semver]: http://semver.org
+[sample]: http://accumulo.apache.org/1.8/examples/sample
 [ITER_TEST]: https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterator_testing


Mime
View raw message