incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <adam.p.fu...@ugov.gov>
Subject Re: Filter use
Date Mon, 12 Dec 2011 14:36:06 GMT
Message from Aaron below.

Adam

---------- Forwarded message ----------
From: "Aaron Cordova" <aaron.cordova@interllective.com>
Date: Dec 12, 2011 9:29 AM
Subject: Re: Filter use
To: "Adam Fuchs" <adam.p.fuchs@ugov.gov>

Joey,

It is possible to write your own filtering iterators to do what you
described. The idea would be to pass in some options to the iterator at
scan-time, such as the values that you want to filter out, e.g. '10' in
your example, and the filtering iterator would apply your logic as you
perform a full table scan.

To write your own filtering iterator use the following as a guide:

1. create a Java class that implements
org.apache.accumulo.core.iterators.filter.Filter and that contains your
filtering logic. The 'init' function lets you save options set by the
client scanner. The 'accept' function is where in your example, you'd just
be comparing all values to the values passed in as options from the scanner
and skipping those that match (i.e. return 'false').

2. Create a jar containing your class and add it to the $ACCUMULO_HOME/lib
directory on every machine in your cluster

3. Configure your table to use your filtering iterator as described in this
section of the user manual:
http://incubator.apache.org/accumulo/user_manual_1.3-incubating/Table_Configuration.html,
where you specify the name of your Java class as the first option in the
Shell, rather than 'ageoff' or 'regex'.

4. Write your Accumulo Java clients to specify values you want to skip as
options to the scanner - i.e:

scanner.setScanIteratorOption("myFilter", "valuesToSkip", "10")

But be sure to do any necessary transformations on your values, since you
can only pass strings as options to your filter. You'd have to parse the
string into an Integer to match any serialized integers you may have stored
as values in your table.

Aaron

Mime
View raw message