Return-Path:
Inorder to use sampling, an Accumulo table must be configured with a class that
-implements org.apache.accumulo.core.sample.Sampler
along with options for
-that class. For guidance on implementing a Sampler see that interface’s
-javadoc. Accumulo provides a few implementations out of the box. For
-information on how to use the samplers that ship with Accumulo look in the
-package org.apache.accumulo.core.sample
and consult the javadoc of the
-classes there. See the sampling example for examples of how to
-configure a Sampler on a table.
Once a table is configured with a sampler all writes after that point will -generate sample data. For data written before sampling was configured sample +
In order to use sampling, an Accumulo table must be configured with a class that +implements Sampler along with options for that class. For guidance on +implementing a Sampler, see the Sampler interface javadoc. Accumulo provides a few +implementations of Sampler out of the box. For information on how to use the samplers that +ship with Accumulo, look in the package org.apache.accumulo.core.client.sample +and consult the javadoc of the classes there. See the sampling example +for examples of how to configure a Sampler on a table.
+ +Once a table is configured with a Sampler, all writes after that point will +generate sample data. For data written before sampling was configured, sample data will not be present. A compaction can be initiated that only compacts the -files in the table that do not have sample data. The example readme shows how -to do this.
+files in the table that do not have sample data. The sampling example +shows how to do this.If the sampling configuration of a table is changed, then Accumulo will start generating new sample data with the new configuration. However old data will @@ -358,19 +357,18 @@ compaction can also be issued in this case to regenerate the sample data.
Inorder to scan sample data, use the setSamplerConfiguration(...)
method on
-Scanner
or BatchScanner
. Please consult this methods javadocs for more
+
In order to scan sample data, use setSamplerConfiguration(...)
method of
+Scanner or BatchScanner. Please consult the javadoc of this method for more
information.
Sample data can also be scanned from within an Accumulo SortedKeyValueIterator
.
+
Sample data can also be scanned from within an Accumulo SortedKeyValueIterator.
To see how to do this, look at the example iterator referenced in the sampling example.
-Also, consult the javadoc on org.apache.accumulo.core.iterators.IteratorEnvironment.cloneWithSamplingEnabled()
.
Map reduce jobs using the AccumuloInputFormat
can also read sample data. See
-the javadoc for the setSamplerConfiguration()
method on
-AccumuloInputFormat
.
Map reduce jobs using the AccumuloInputFormat can also read sample data. See
+the javadoc for the setSamplerConfiguration()
method of AccumuloInputFormat.
Scans over sample data will throw a SampleNotPresentException
in the following cases :
Scans over sample data will throw a SampleNotPresentException in the following cases :
When generating rfiles to bulk import into Accumulo, those rfiles can contain
-sample data. To use this feature, look at the javadoc on the
-AccumuloFileOutputFormat.setSampler(...)
method.
setSampler(...)
+method of AccumuloFileOutputFormat.
http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/2d7cc3d5/docs/unreleased/getting-started/table_configuration.html
----------------------------------------------------------------------
diff --git a/docs/unreleased/getting-started/table_configuration.html b/docs/unreleased/getting-started/table_configuration.html
index 9c8fc2d..8a691ca 100644
--- a/docs/unreleased/getting-started/table_configuration.html
+++ b/docs/unreleased/getting-started/table_configuration.html
@@ -400,19 +400,14 @@ com.test.AnotherConstraint=2
Currently there are no general-purpose constraints provided with the Accumulo distribution. New constraints can be created by writing a Java class that implements -the following interface:
+the Constraint interface. - org.apache.accumulo.core.constraints.Constraint
-
-To deploy a new constraint, create a jar file containing the class implementing the -new constraint and place it in the lib directory of the Accumulo installation. New +
To deploy a new constraint, create a jar file containing a class implementing Constraint
+and place it in the lib/
directory of the Accumulo installation. New
constraint jars can be added to Accumulo and enabled without restarting but any
change to an existing constraint class requires Accumulo to be restarted.
See the contraints examples -for example code.
+See the constraints examples for example code.
The bloom filter examples -contains an extensive example of using Bloom Filters.
+The bloom filter examples contains an extensive example of using Bloom Filters.
org.apache.accumulo.core.iterators.user
package.
+org.apache.accumulo.core.iterators.user package.
In each case, any custom Iterators must be included in Accumulo’s classpath,
typically by including a jar in lib/
or lib/ext/
, although the VFS classloader
allows for classpath manipulation using a variety of schemes including URLs and HDFS URIs.
@@ -445,7 +439,7 @@ allows for classpath manipulation using a variety of schemes including URLs and
Iterators can be configured on a table at scan, minor compaction and/or major -compaction scopes. If the Iterator implements the OptionDescriber interface, the +compaction scopes. If the Iterator implements the OptionDescriber interface, the setiter command can be used which will interactively prompt the user to provide values for the given necessary options.
@@ -458,7 +452,7 @@ user@myinstance mytable> setiter -t mytable -scan -p 15 -n myiter -class com.The config command can always be used to manually configure iterators which is useful -in cases where the Iterator does not implement the OptionDescriber interface.
+in cases where the Iterator does not implement the OptionDescriber interface.config -t mytable -s table.iterator.scan.myiter=15,com.company.MyIterator
config -t mytable -s table.iterator.minc.myiter=15,com.company.MyIterator
@@ -560,11 +554,10 @@ are removed from disk as part of the regular garbage collection process.
Filters
When scanning over a set of key-value pairs it is possible to apply an arbitrary
-filtering policy through the use of a Filter. Filters are types of iterators that return
+filtering policy through the use of a Filter. Filters are types of iterators that return
only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters
that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added
-by writing a Java class that extends the
-org.apache.accumulo.core.iterators.Filter
class.
+by writing a Java class that extends the Filter class.
The AgeOff filter can be configured to remove data older than a certain date or a fixed
amount of time from the present. The following example sets a table to delete
@@ -671,14 +664,12 @@ foo day:20080103 [] 1
Accumulo includes some useful Combiners out of the box. To find these look in
-the org.apache.accumulo.core.iterators.user
package.
Additional Combiners can be added by creating a Java class that extends
-org.apache.accumulo.core.iterators.Combiner
and adding a jar containing that
-class to Accumulo’s lib/ext directory.
lib/ext
directory.
-See the combiner example -for example code.
+See the combiner example for example code.