Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (nike.apache.org: domain of texpilot@gmail.com designates
 209.85.215.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAF1jEfAxP9ZjBwMzwGSPGpwLfbJ4Ko+pbBcBQkiQi-8XZxAwTw@mail.gmail.com>
References: 
 <CAPnhrdszdM44OLKvN79toeXtjVf01JHOwnhOdnPYgLU10C9r+Q@mail.gmail.com>
	<CAMz+Dut_gntdOXzinBEwQphqc7OTNoutSgTVYKfwQ0-p8UGizw@mail.gmail.com>
	<CAPnhrdssnRPB15dxqb17KOuBY++oP-rD2Sw7MqKk4onBdrhRNg@mail.gmail.com>
	<CAF1jEfAiYN0USAvQgCbnh4J=3FP5LqEL5WwG7hSwaPuLTF=oOA@mail.gmail.com>
	<CAPnhrdu+6mW8gGWZaPhHjqoXhCtTL61C6PwkOgdQOAZHu+=BAg@mail.gmail.com>
	<CAF1jEfDwarzKq+6C+9cgBLz151fE-vd_EjEy5waXUytb9fggzg@mail.gmail.com>
	<CAPnhrdsH8WcbqA+FxwbmTts4SD_a-Jk6cocrG_wvSNBs1t9eFw@mail.gmail.com>
	<CAF1jEfAxP9ZjBwMzwGSPGpwLfbJ4Ko+pbBcBQkiQi-8XZxAwTw@mail.gmail.com>
Date: Wed, 6 Nov 2013 20:28:43 -0600
Message-ID: 
 <CAPnhrdvODp_7K=-hEmOrzXW3NT8ozuJdDyQdCKwLuUpVnM3syA@mail.gmail.com>
Subject: Re: How to remove entire row at the server side?
From: "Terry P." <texpilot@gmail.com>
To: "user@accumulo.apache.org" <user@accumulo.apache.org>
Content-Type: multipart/alternative; boundary=089e0160c5444bde1d04ea8d0659

--089e0160c5444bde1d04ea8d0659
Content-Type: text/plain; charset=UTF-8

Hi Billie,
Adding the "implements OptionDescriber" is what was needed to allow the
iterator to be added in the shell with the setiter command.

MANY thanks for your help!  A quick scan test shows it's working as a scan
iterator, though I'll be doing much more thorough testing tomorrow.  Thank
you thank you!


On Wed, Nov 6, 2013 at 6:56 PM, Billie Rinaldi <billie.rinaldi@gmail.com>wrote:

> Making your class "extends RowFilter implements OptionDescriber" should be
> fine.  One reason it might have been complaining about the @Override
> annotations is if the Java compiler is set to 1.5 compatibility rather than
> 1.6.
>
> Regarding getting the same error, did you replace all the jars containing
> your iterator on all the nodes?  If you did, perhaps it's not reloading the
> jars properly.  You could restart accumulo to make sure it's using the
> fresh jar, or you could try renaming your class and dropping it in with a
> different jar name to ensure the new code is being picked up.
>
>
> On Wed, Nov 6, 2013 at 2:50 PM, Terry P. <texpilot@gmail.com> wrote:
>
>> Hi Billie,
>> Many thanks for your help.  I added those two methods, but had to remove
>> the @Override as the RowFilter class I'm extending from doesn't implement
>> them.  Even with these methods in place, I still get the same error trying
>> to add the iterator in the shell.
>>
>> I notice that the RowFilter class extends WrappingIterator, which also
>> doesn't appear to have the describeOptions and validateOptions methods ...
>> should I try extending from just the Filter class?  I didn't understand the
>> benefits William listed of extending from the RowFilter class.  I just know
>> that once I identify a RowKey should be purged based on its expTs ColFam
>> Value, I want to remove all entries for that RowKey.
>>
>>
>> On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi <billie.rinaldi@gmail.com>wrote:
>>
>>> To use setiter in the shell, your iterator must implement
>>> OptionDescriber.  It has two methods, and something like the following
>>> should work for your iterator.  If you implement passing options to the
>>> iterator, you'll want to change the null parameters to the constructor of
>>> IteratorOptions below, and probably also to do some validation in
>>> validateOptions.
>>>
>>>   @Override
>>>   public IteratorOptions describeOptions() {
>>>     return new IteratorOptions("expTs", "Removes rows based on the
>>> column designated as the expiration timestamp column family", null, null);
>>>   }
>>>
>>>   @Override
>>>   public boolean validateOptions(Map<String,String> options) {
>>>     return true;
>>>   }
>>>
>>>
>>>
>>> On Wed, Nov 6, 2013 at 12:49 PM, Terry P. <texpilot@gmail.com> wrote:
>>>
>>>> Eyes of an eagle Billie!  com is correct, but after viewing
>>>> "org.apache.accumulo" so many times, my brain was stuck on org and I goofed
>>>> in my setiter syntax.
>>>>
>>>> With THAT corrected, here is the new error:
>>>>
>>>> root@meta> setiter -class
>>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
>>>> 20 -scan -t itertest
>>>> 2013-11-06 14:46:28,280 [shell.Shell] ERROR:
>>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could
>>>> not be initialized (Unable to load
>>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
>>>> org.apache.accumulo.core.iterators.OptionDescriber; configure with 'config'
>>>> instead)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi <
>>>> billie.rinaldi@gmail.com> wrote:
>>>>
>>>>> Is there a typo in the package name?  One place says "com" and the
>>>>> other "org".
>>>>>
>>>>>
>>>>> On Wed, Nov 6, 2013 at 12:37 PM, Terry P. <texpilot@gmail.com> wrote:
>>>>>
>>>>>> Hi William, many thanks for the explanation of scan time versus
>>>>>> compaction time. I'll look through the classes again and note where the
>>>>>> remove versus suppress wordings are used and open a ticket.
>>>>>>
>>>>>> As mentioned, I only dabble in java, but regardless of that fact at
>>>>>> this point I'm the one that has to get this done. I've hobbled together my
>>>>>> first attempt, but I get the following error where I try to add it as a
>>>>>> scan iterator for testing:
>>>>>>
>>>>>> root@meta> setiter -class
>>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
>>>>>> 20 -scan -t itertest
>>>>>> 2013-11-06 14:06:34,914 [shell.Shell] ERROR:
>>>>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could
>>>>>> not be initialized (Servers are unable to load
>>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
>>>>>> org.apache.accumulo.core.iterators.SortedKeyValueIterator)
>>>>>>
>>>>>> Here's my source.  Note that the value stored in the expTs ColFam is
>>>>>> in the format "yyyyMMddHHmmssS", which I convert to a long for a direct
>>>>>> comparison to System.currentTimeMillis(). I only overrode the init and
>>>>>> acceptRow methods, hoping the others would work as-is from the base class.
>>>>>>
>>>>>> One clarification: turns out expTs is the ColumnFamily, and the
>>>>>> ingest app does not assign a ColumnQualifier for expTs. So to amend my
>>>>>> prior table layout (including the datetime format):
>>>>>>
>>>>>>
>>>>>> Format: Key:CF:CQ:Value
>>>>>> abc:data:title:"My fantastic data"
>>>>>> abc:data:content:<bytedata>
>>>>>> abc:creTs::20130804171412445
>>>>>> abc:*expTs*::20131104171412445
>>>>>> ... 6-8 more columns of data per row ...
>>>>>>
>>>>>> where *expTs* is the ColumnFamily to determine if the entire row
>>>>>> should be removed based on whether its value is <= NOW.  If a row has not
>>>>>> yet been assigned an expiration date, expTs will not be set and the
>>>>>> ColumnFamily will not yet be present.  Seems like an odd choice to use
>>>>>> distinct Column Families, without Column Qualifiers, but that's how the
>>>>>> ingest app was done.
>>>>>>
>>>>>> I greatly appreciate any advice you can provide.
>>>>>>
>>>>>> package com.esa.accumulo.iterators;
>>>>>>
>>>>>> import java.io.IOException;
>>>>>> import java.text.ParseException;
>>>>>> import java.text.SimpleDateFormat;
>>>>>> import java.util.Date;
>>>>>> import java.util.Map;
>>>>>>
>>>>>> import org.apache.accumulo.core.data.Key;
>>>>>> import org.apache.accumulo.core.data.Value;
>>>>>> import org.apache.accumulo.core.iterators.IteratorEnvironment;
>>>>>> import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
>>>>>> import org.apache.accumulo.core.iterators.user.RowFilter;
>>>>>>
>>>>>> /**
>>>>>>  * A filter that removes rows based on the column designated as the
>>>>>> "expiration timestamp" column family.
>>>>>>  *
>>>>>>  * It removes the row if the value in the expirationTimestamp column
>>>>>> is less than currentTime.
>>>>>>  *
>>>>>>  * TODO: The designation of the expirationTimestamp ColumnFamily and
>>>>>> its DateFormat is
>>>>>>  * set in the iterator options when the iterator is applied to the
>>>>>> table. (For
>>>>>>  * now it is hardcoded to match the format used in the Solr-Accumulo
>>>>>> plugin)
>>>>>>  */
>>>>>> public class ExpirationTimestampPurgeFilter extends RowFilter {
>>>>>>   private long currentTime;
>>>>>>   // TODO: make accumuloDateFormat settable via Iterator Options
>>>>>>   // Date Format for Expiration Timestamp ColumnFamily stored in
>>>>>> Accumulo
>>>>>>   private String expTsDateFormat = "yyyyMMddHHmmssS";
>>>>>>   SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat);
>>>>>>
>>>>>>   // TODO: make expTs settable via Iterator Options
>>>>>>   // ColumnFamily containing Expiration Timestamp value (note ingest
>>>>>> app
>>>>>>   // did NOT assign a ColumnQualifier, only a ColumnFamily)
>>>>>>   private String expTsColFam = "expTs";
>>>>>>
>>>>>>   @Override
>>>>>>   public boolean acceptRow(SortedKeyValueIterator<Key, Value>
>>>>>> rowIterator)
>>>>>>     throws IOException {
>>>>>>
>>>>>>     if
>>>>>> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {
>>>>>>        Date expTsDate = null;
>>>>>>        try {
>>>>>>          expTsDate = df.parse(rowIterator.getTopValue().toString());
>>>>>>            if (expTsDate.getTime() < currentTime)
>>>>>>              return false;
>>>>>>        } catch (ParseException e) {
>>>>>>          // TODO Auto-generated catch block
>>>>>>          e.printStackTrace();
>>>>>>        }
>>>>>>     }
>>>>>>     return true;
>>>>>>   }
>>>>>>
>>>>>>   @Override
>>>>>>   public void init(SortedKeyValueIterator<Key, Value> source,
>>>>>>       Map<String, String> options, IteratorEnvironment env) throws
>>>>>> IOException {
>>>>>>     super.init(source, options, env);
>>>>>>     currentTime = System.currentTimeMillis();
>>>>>>   }
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 5, 2013 at 8:48 PM, William Slacum <
>>>>>> wilhelm.von.cloud@accumulo.net> wrote:
>>>>>>
>>>>>>> If an iterator is only set at scan time, then its logic will only be
>>>>>>> applied when a client scans the table. The data will persist through major
>>>>>>> and minor compaction and be visible if you scanned the RFile(s) backing the
>>>>>>> table. "Suppress" is the better word in this case. Would you please open a
>>>>>>> ticket pointing us where to update the documentation?
>>>>>>>
>>>>>>> It looks like you'd want to implement a RowFilter for your use case.
>>>>>>> It has the necessary hooks to avoid reading a whole row into memory and
>>>>>>> handling the logic of determining whether or not to write keys that occur
>>>>>>> before the column you're filtering on (at the cost of reading those keys
>>>>>>> twice).
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <texpilot@gmail.com> wrote:
>>>>>>>
>>>>>>>> Greetings everyone,
>>>>>>>> I'm looking at the AgeOffFilter as a base from which to write a
>>>>>>>> server-side filter / iterator to purge rows when they have aged off based
>>>>>>>> on the value of a specific column in the row (expiry datetime <= now). So
>>>>>>>> this differs from the AgeOffFilter in that the criterion for removal is
>>>>>>>> from the same column in every row (not the Accumulo timestamp for an
>>>>>>>> individual entry), and we need to remove the entire row not just individual
>>>>>>>> entries. For example:
>>>>>>>>
>>>>>>>> Format: Key:CF:CQ:Value
>>>>>>>> abc:data:title:"My fantastic data"
>>>>>>>> abc:data:content:<bytedata>
>>>>>>>> abc:data:creTs:2013-08-04T17:14:12Z
>>>>>>>> abc:data:*expTs*:2013-11-04T17:14:12Z
>>>>>>>> ... 6-8 more columns of data per row ...
>>>>>>>>
>>>>>>>> where *expTs* is the column to determine if the entire row should
>>>>>>>> be removed based on whether its value is <= NOW.
>>>>>>>>
>>>>>>>> This task seemed easy enough as a client program (and it is
>>>>>>>> really), but a server-side iterator would be far more efficient than
>>>>>>>> sending millions of rowkeys across the network just to delete them (we'll
>>>>>>>> be deleting more than a million every hour).  But I'm struggling to get
>>>>>>>> there.
>>>>>>>>
>>>>>>>> In looking at AgeOffFilter.java, is the "magic" in the AgeOffFilter
>>>>>>>> class that removes (deletes) an entry from a table the fact that the accept
>>>>>>>> method returns false, combined with the fact that the iterator would be set
>>>>>>>> to run at -majc or -minc time and it is the compaction code that actually
>>>>>>>> deletes the entry?  If set to run only at scan time, would AgeOffFilter
>>>>>>>> simply not return the rows during the scan, but not delete them?  The
>>>>>>>> wording in the iterator classes varies, some saying "remove" others say
>>>>>>>> "suppress" so it's not clear to me
>>>>>>>>
>>>>>>>> If that's the case, then I think I know where to implement the
>>>>>>>> logic. The question is, how can I remove all the entries for the row once
>>>>>>>> the accept method has determined it meets the criteria?
>>>>>>>>
>>>>>>>> Or as Mike Drob mentioned in a prior post, will basing my class on
>>>>>>>> the RowFilter class instead of just Filter make things easier?  Or the
>>>>>>>> WholeRowIterator?  Just trying to find the simplest solution.
>>>>>>>>
>>>>>>>> Sorry for what may be obvious questions but I'm more of a DB
>>>>>>>> Architect that does some coding, and not a Java programmer by trade. With
>>>>>>>> all of the amazing things Accumulo does, honestly I was surprised when I
>>>>>>>> couldn't find a way to delete rows in the shell by criteria other than the
>>>>>>>> rowkey!  I'm more used to having a shell to 'delete from *table *where
>>>>>>>> *column *<= *value*'.
>>>>>>>>
>>>>>>>> But looking at it now, everyone's criteria for deletion will likely
>>>>>>>> be different given the flexibility of a key=>value store.  If our rowkey
>>>>>>>> had the date/timestamp as a prefix, I know an easy deletemany command in
>>>>>>>> the shell would do the trick -- but the nature of the data is such that
>>>>>>>> initially no expiration timestamp is set, and there is no means to update
>>>>>>>> the key from the client app when expiration timestamp finally gets set (too
>>>>>>>> much rework on that common tool I'm afraid).
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

--089e0160c5444bde1d04ea8d0659
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Hi Billie,<br></div>Adding the &quot;implements =
OptionDescriber&quot; is what was needed to allow the iterator to be added =
in the shell with the setiter command.<br><br></div><div>MANY thanks for yo=
ur help!=C2=A0 A quick scan test shows it&#39;s working as a scan iterator,=
 though I&#39;ll be doing much more thorough testing tomorrow.=C2=A0 Thank =
you thank you!<br>
<br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">On Wed, Nov 6, 2013 at 6:56 PM, Billie Rinaldi <span dir=3D"ltr">&lt;<a =
href=3D"mailto:billie.rinaldi@gmail.com" target=3D"_blank">billie.rinaldi@g=
mail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Making your class &quo=
t;extends RowFilter implements OptionDescriber&quot; should be fine.=C2=A0 =
One reason it might have been complaining about the @Override annotations i=
s if the Java compiler is set to 1.5 compatibility rather than 1.6.<br>


<br></div>Regarding getting the same error, did you replace all the jars co=
ntaining your iterator on all the nodes?=C2=A0 If you did, perhaps it&#39;s=
 not reloading the jars properly.=C2=A0 You could restart accumulo to make =
sure it&#39;s using the fresh jar, or you could try renaming your class and=
 dropping it in with a different jar name to ensure the new code is being p=
icked up.<div>
<div class=3D"h5"><br>
<div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Nov 6, 20=
13 at 2:50 PM, Terry P. <span dir=3D"ltr">&lt;<a href=3D"mailto:texpilot@gm=
ail.com" target=3D"_blank">texpilot@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Hi Billie,<br></d=
iv>Many thanks for your help.=C2=A0 I added those two methods, but had to r=
emove the @Override as the RowFilter class I&#39;m extending from doesn&#39=
;t implement them.=C2=A0 Even with these methods in place, I still get the =
same error trying to add the iterator in the shell.<br>


<br></div>I notice that the RowFilter class extends WrappingIterator, which=
 also doesn&#39;t appear to have the describeOptions and validateOptions me=
thods ... should I try extending from just the Filter class?=C2=A0 I didn&#=
39;t understand the benefits William listed of extending from the RowFilter=
 class.=C2=A0 I just know that once I identify a RowKey should be purged ba=
sed on its expTs ColFam Value, I want to remove all entries for that RowKey=
.<br>


</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi <span dir=3D"ltr">&lt;<a =
href=3D"mailto:billie.rinaldi@gmail.com" target=3D"_blank">billie.rinaldi@g=
mail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">To use setiter in the shell=
, your iterator must implement OptionDescriber.=C2=A0 It has two methods, a=
nd something like the following should work for your iterator.=C2=A0 If you=
 implement passing options to the iterator, you&#39;ll want to change the n=
ull parameters to the constructor of IteratorOptions below, and probably al=
so to do some validation in validateOptions.<br>


<br>=C2=A0 @Override<br>=C2=A0 public IteratorOptions describeOptions() {<b=
r>=C2=A0=C2=A0=C2=A0 return new IteratorOptions(&quot;expTs&quot;, &quot;Re=
moves rows based on the column designated as the expiration timestamp colum=
n family&quot;, null, null);<br>


=C2=A0 }<br><br>=C2=A0 @Override<br>=C2=A0 public boolean validateOptions(M=
ap&lt;String,String&gt; options) {<br>=C2=A0=C2=A0=C2=A0 return true;<br>=
=C2=A0 }<br><br></div><div><div><div class=3D"gmail_extra"><br><br><div cla=
ss=3D"gmail_quote">
On Wed, Nov 6, 2013 at 12:49 PM, Terry P. <span dir=3D"ltr">&lt;<a href=3D"=
mailto:texpilot@gmail.com" target=3D"_blank">texpilot@gmail.com</a>&gt;</sp=
an> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Eyes of an eagle Billi=
e!=C2=A0 com is correct, but after viewing &quot;org.apache.accumulo&quot; =
so many times, my brain was stuck on org and I goofed in my setiter syntax.=
<br>


<br></div>With THAT corrected, here is the new error:<br>
<br>root@meta&gt; setiter -class com.esa.accumulo.iterators.ExpirationTimes=
tampPurgeFilter -n expTsFilter -p 20 -scan -t itertest<br>2013-11-06 14:46:=
28,280 [shell.Shell] ERROR: org.apache.accumulo.core.util.shell.ShellComman=
dException: Command could not be initialized (Unable to load com.esa.accumu=
lo.iterators.ExpirationTimestampPurgeFilter as type org.apache.accumulo.cor=
e.iterators.OptionDescriber; configure with &#39;config&#39; instead)<br>


<br><br><br></div><div><div><div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi <span dir=
=3D"ltr">&lt;<a href=3D"mailto:billie.rinaldi@gmail.com" target=3D"_blank">=
billie.rinaldi@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Is there a typo in the pack=
age name?=C2=A0 One place says &quot;com&quot; and the other &quot;org&quot=
;.<br>


</div>
<div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On =
Wed, Nov 6, 2013 at 12:37 PM, Terry P. <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:texpilot@gmail.com" target=3D"_blank">texpilot@gmail.com</a>&gt;</span>=
 wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hi William, =
many thanks for the explanation of scan time versus compaction time. I&#39;=
ll look through the classes again and note where the remove versus suppress=
 wordings are used and open a ticket.<br>


<br></div><div>As mentioned, I only dabble in java, but regardless of that =
fact at this point I&#39;m the one that has to get this done. I&#39;ve hobb=
led together my first attempt, but I get the following error where I try to=
 add it as a scan iterator for testing:<br>


</div>
<span style=3D"font-family:courier new,monospace"><br>root@meta&gt; setiter=
 -class org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsF=
ilter -p 20 -scan -t itertest<br>2013-11-06 14:06:34,914 [shell.Shell] ERRO=
R: org.apache.accumulo.core.util.shell.ShellCommandException: Command could=
 not be initialized (Servers are unable to load org.esa.accumulo.iterators.=
ExpirationTimestampPurgeFilter as type org.apache.accumulo.core.iterators.S=
ortedKeyValueIterator)</span><br>


<br></div>Here&#39;s my source.=C2=A0 Note that the value stored in the exp=
Ts ColFam is in the format &quot;yyyyMMddHHmmssS&quot;, which I convert to =
a long for a direct comparison to System.currentTimeMillis(). I only overro=
de the init and acceptRow methods, hoping the others would work as-is from =
the base class.<br>


<br></div>One clarification: turns out expTs is the ColumnFamily, and the i=
ngest app does not assign a ColumnQualifier for expTs. So to amend my prior=
 table layout (including the datetime format):<div><br><br>
<div>Format: Key:CF:CQ:Value<br>
</div></div><div><div>abc:data:title:&quot;My fantastic data&quot;<br>abc:d=
ata:content:&lt;bytedata&gt;<br></div>abc:creTs::20130804171412445<br>abc:<=
b>expTs</b>::20131104171412445<br>


</div><div><div>... 6-8 more columns of data per row ...<br></div></div><di=
v><br>where <b>expTs</b> is the ColumnFamily to determine if the entire row=
 should be removed based on whether its value is &lt;=3D NOW.=C2=A0 If a ro=
w has not yet been assigned an expiration date, expTs will not be set and t=
he ColumnFamily will not yet be present.=C2=A0 Seems like an odd choice to =
use distinct Column Families, without Column Qualifiers, but that&#39;s how=
 the ingest app was done.<br>


</div><div><br>I greatly appreciate any advice you can provide.<br><br><spa=
n style=3D"font-family:courier new,monospace">package com.esa.accumulo.iter=
ators;<br><br>import java.io.IOException;<br>import java.text.ParseExceptio=
n;<br>


import java.text.SimpleDateFormat;<br>import java.util.Date;<br>import java=
.util.Map;<br><br>import org.apache.accumulo.core.data.Key;<br>import org.a=
pache.accumulo.core.data.Value;<br>import org.apache.accumulo.core.iterator=
s.IteratorEnvironment;<br>


import org.apache.accumulo.core.iterators.SortedKeyValueIterator;<br>import=
 org.apache.accumulo.core.iterators.user.RowFilter;<br><br>/**<br>=C2=A0* A=
 filter that removes rows based on the column designated as the &quot;expir=
ation timestamp&quot; column family.<br>


=C2=A0* <br>=C2=A0* It removes the row if the value in the expirationTimest=
amp column is less than currentTime.<br>=C2=A0* <br>=C2=A0* TODO: The desig=
nation of the expirationTimestamp ColumnFamily and its DateFormat is<br>=C2=
=A0* set in the iterator options when the iterator is applied to the table.=
 (For<br>


=C2=A0* now it is hardcoded to match the format used in the Solr-Accumulo p=
lugin)<br>=C2=A0*/<br>public class ExpirationTimestampPurgeFilter extends R=
owFilter {<br>=C2=A0 private long currentTime;<br>=C2=A0 // TODO: make accu=
muloDateFormat settable via Iterator Options<br>


=C2=A0 // Date Format for Expiration Timestamp ColumnFamily stored in Accum=
ulo<br>=C2=A0 private String expTsDateFormat =3D &quot;yyyyMMddHHmmssS&quot=
;;<br>=C2=A0 SimpleDateFormat df =3D new SimpleDateFormat(expTsDateFormat);=
<br><br>=C2=A0 // TODO: make expTs settable via Iterator Options<br>


=C2=A0 // ColumnFamily containing Expiration Timestamp value (note ingest a=
pp<br>=C2=A0 // did NOT assign a ColumnQualifier, only a ColumnFamily)<br>=
=C2=A0 private String expTsColFam =3D &quot;expTs&quot;;<br><br>=C2=A0 @Ove=
rride<br>=C2=A0 public boolean acceptRow(SortedKeyValueIterator&lt;Key, Val=
ue&gt; rowIterator)<br>


=C2=A0=C2=A0=C2=A0 throws IOException {<br><br>=C2=A0=C2=A0=C2=A0 if (rowIt=
erator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {<br>=
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Date expTsDate =3D null;<br>=C2=A0=C2=A0 =
=C2=A0=C2=A0=C2=A0 try {<br>=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 exp=
TsDate =3D df.parse(rowIterator.getTopValue().toString());<br>


=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (expTsDate.getTime() =
&lt; currentTime)<br>=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 return false;<br>=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (ParseEx=
ception e) {<br>=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // TODO Auto-ge=
nerated catch block<br>=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 e.printS=
tackTrace();<br>=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }<br>=C2=A0=C2=A0=C2=A0 }<b=
r>


=C2=A0=C2=A0=C2=A0 return true;<br>=C2=A0 }<br><br>=C2=A0 @Override<br>=C2=
=A0 public void init(SortedKeyValueIterator&lt;Key, Value&gt; source,<br>=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Map&lt;String, String&gt; options, IteratorE=
nvironment env) throws IOException {<br>=C2=A0=C2=A0=C2=A0 super.init(sourc=
e, options, env);<br>


=C2=A0=C2=A0=C2=A0 currentTime =3D System.currentTimeMillis();<br>=C2=A0 }<=
br><br>}</span><div><div><br><br><div><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote">On Tue, Nov 5, 2013 at 8:48 PM, William Slacum <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:wilhelm.von.cloud@accumulo.net" target=3D"=
_blank">wilhelm.von.cloud@accumulo.net</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">If an it=
erator is only set at scan time, then its logic will only be applied when a=
 client scans the table. The data will persist through major and minor comp=
action and be visible if you scanned the RFile(s) backing the table. &quot;=
Suppress&quot; is the better word in this case. Would you please open a tic=
ket pointing us where to update the documentation?<div>


<br></div><div>It looks like you&#39;d want to implement a RowFilter for yo=
ur use case. It has the necessary hooks to avoid reading a whole row into m=
emory and handling the logic of determining whether or not to write keys th=
at occur before the column you&#39;re filtering on (at the cost of reading =
those keys twice).<br>


<div><br></div><div><br></div></div></div><div><div><div class=3D"gmail_ext=
ra"><div class=3D"gmail_quote">On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:texpilot@gmail.com" target=3D"_blank">=
texpilot@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><di=
v><div>Greetings everyone,<br></div>I&#39;m looking at the AgeOffFilter as =
a base from which to write a server-side filter / iterator to purge rows wh=
en they have aged off based on the value of a specific column in the row (e=
xpiry datetime &lt;=3D now). So this differs from the AgeOffFilter in that =
the criterion for removal is from the same column in every row (not the Acc=
umulo timestamp for an individual entry), and we need to remove the entire =
row not just individual entries. For example:<br>


<br></div><div>Format: Key:CF:CQ:Value<br></div><div>abc:data:title:&quot;M=
y fantastic data&quot;<br>abc:data:content:&lt;bytedata&gt;<br>abc:data:cre=
Ts:2013-08-04T17:14:12Z<br>abc:data:<b>expTs</b>:2013-11-04T17:14:12Z<br>


</div><div>... 6-8 more columns of data per row ...<br></div><div><br>where=
 <b>expTs</b> is the column to determine if the entire row should be remove=
d based on whether its value is &lt;=3D NOW.<br></div><div><br>This task se=
emed easy enough as a client program (and it is really), but a server-side =
iterator would be far more efficient than sending millions of rowkeys acros=
s the network just to delete them (we&#39;ll be deleting more than a millio=
n every hour).=C2=A0 But I&#39;m struggling to get there.<br>


</div><div>


<br></div>In looking at AgeOffFilter.java, is the &quot;magic&quot; in the =
AgeOffFilter class that removes (deletes) an entry from a table the fact th=
at the accept method returns false, combined with the fact that the iterato=
r would be set to run at -majc or -minc time and it is the compaction code =
that actually deletes the entry?=C2=A0 If set to run only at scan time, wou=
ld AgeOffFilter simply not return the rows during the scan, but not delete =
them?=C2=A0 The wording in the iterator classes varies, some saying &quot;r=
emove&quot; others say &quot;suppress&quot; so it&#39;s not clear to me<br>


<br>If that&#39;s the case, then I think I know where to implement the logi=
c. The question is, how can I remove all the entries for the row once the a=
ccept method has determined it meets the criteria?<br><br></div><div>Or as =
Mike Drob mentioned in a prior post, will basing my class on the RowFilter =
class instead of just Filter make things easier?=C2=A0 Or the WholeRowItera=
tor?=C2=A0 Just trying to find the simplest solution.<br>


</div><div><br></div><div>Sorry for what may be obvious questions but I&#39=
;m more of a DB Architect that does some coding, and not a Java programmer =
by trade. With all of the amazing things Accumulo does, honestly I was surp=
rised when I couldn&#39;t find a way to delete rows in the shell by criteri=
a other than the rowkey!=C2=A0 I&#39;m more used to having a shell to &#39;=
delete from <i>table </i>where <i>column </i>&lt;=3D <i>value</i>&#39;.=C2=
=A0 <br>


<br>But looking at it now, everyone&#39;s criteria for deletion will likely=
 be different given the flexibility of a key=3D&gt;value store.=C2=A0 If ou=
r rowkey had the date/timestamp as a prefix, I know an easy deletemany comm=
and in the shell would do the trick -- but the nature of the data is such t=
hat initially no expiration timestamp is set, and there is no means to upda=
te the key from the client app when expiration timestamp finally gets set (=
too much rework on that common tool I&#39;m afraid). <br>


<br></div><div>Thanks in advance.<br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>

--089e0160c5444bde1d04ea8d0659--