Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C21A410376 for ; Thu, 7 Nov 2013 02:29:11 +0000 (UTC) Received: (qmail 17659 invoked by uid 500); 7 Nov 2013 02:29:11 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 17625 invoked by uid 500); 7 Nov 2013 02:29:11 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 17617 invoked by uid 99); 7 Nov 2013 02:29:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 02:29:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of texpilot@gmail.com designates 209.85.215.182 as permitted sender) Received: from [209.85.215.182] (HELO mail-ea0-f182.google.com) (209.85.215.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 02:29:04 +0000 Received: by mail-ea0-f182.google.com with SMTP id o10so151407eaj.13 for ; Wed, 06 Nov 2013 18:28:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=lBhYpkLdRm4Ujy3s1WUiCrKNHoaXTk0sjvPET9E05no=; b=FsHJzFfyZhfIMrIW0k1CBuAw1UnBT9MkN4U5dPxsQW4xSp6HGilT7hWdXpd8JMjllh tvvBKCLm0j8IHRpcO3LEY0qf1O/4JLs5yCSxtngoaNWy9vQNorERJJ3P9mAlbyilu6iU YQHce/Gl8kS8Kk4TFsMtZ9fcabhe7tsxBtRWBrDxy4gcO53urGq9faomaIw64rZGUuPX +sEeMW26sf78SM/0udMc124WGPjtO6aPR8vAMEx0Q3RaOx3E/5tC4he+ACY51/IphTXv xdGxlfap8TcGOBDs8I9GKFaLmcLiIAsoYjz2+Wta4W98htS3HtszoLlO/e0L/u0UcKNN vsUQ== MIME-Version: 1.0 X-Received: by 10.15.24.68 with SMTP id i44mr1021515eeu.87.1383791323764; Wed, 06 Nov 2013 18:28:43 -0800 (PST) Received: by 10.223.87.74 with HTTP; Wed, 6 Nov 2013 18:28:43 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Nov 2013 20:28:43 -0600 Message-ID: Subject: Re: How to remove entire row at the server side? From: "Terry P." To: "user@accumulo.apache.org" Content-Type: multipart/alternative; boundary=089e0160c5444bde1d04ea8d0659 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160c5444bde1d04ea8d0659 Content-Type: text/plain; charset=UTF-8 Hi Billie, Adding the "implements OptionDescriber" is what was needed to allow the iterator to be added in the shell with the setiter command. MANY thanks for your help! A quick scan test shows it's working as a scan iterator, though I'll be doing much more thorough testing tomorrow. Thank you thank you! On Wed, Nov 6, 2013 at 6:56 PM, Billie Rinaldi wrote: > Making your class "extends RowFilter implements OptionDescriber" should be > fine. One reason it might have been complaining about the @Override > annotations is if the Java compiler is set to 1.5 compatibility rather than > 1.6. > > Regarding getting the same error, did you replace all the jars containing > your iterator on all the nodes? If you did, perhaps it's not reloading the > jars properly. You could restart accumulo to make sure it's using the > fresh jar, or you could try renaming your class and dropping it in with a > different jar name to ensure the new code is being picked up. > > > On Wed, Nov 6, 2013 at 2:50 PM, Terry P. wrote: > >> Hi Billie, >> Many thanks for your help. I added those two methods, but had to remove >> the @Override as the RowFilter class I'm extending from doesn't implement >> them. Even with these methods in place, I still get the same error trying >> to add the iterator in the shell. >> >> I notice that the RowFilter class extends WrappingIterator, which also >> doesn't appear to have the describeOptions and validateOptions methods ... >> should I try extending from just the Filter class? I didn't understand the >> benefits William listed of extending from the RowFilter class. I just know >> that once I identify a RowKey should be purged based on its expTs ColFam >> Value, I want to remove all entries for that RowKey. >> >> >> On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi wrote: >> >>> To use setiter in the shell, your iterator must implement >>> OptionDescriber. It has two methods, and something like the following >>> should work for your iterator. If you implement passing options to the >>> iterator, you'll want to change the null parameters to the constructor of >>> IteratorOptions below, and probably also to do some validation in >>> validateOptions. >>> >>> @Override >>> public IteratorOptions describeOptions() { >>> return new IteratorOptions("expTs", "Removes rows based on the >>> column designated as the expiration timestamp column family", null, null); >>> } >>> >>> @Override >>> public boolean validateOptions(Map options) { >>> return true; >>> } >>> >>> >>> >>> On Wed, Nov 6, 2013 at 12:49 PM, Terry P. wrote: >>> >>>> Eyes of an eagle Billie! com is correct, but after viewing >>>> "org.apache.accumulo" so many times, my brain was stuck on org and I goofed >>>> in my setiter syntax. >>>> >>>> With THAT corrected, here is the new error: >>>> >>>> root@meta> setiter -class >>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >>>> 20 -scan -t itertest >>>> 2013-11-06 14:46:28,280 [shell.Shell] ERROR: >>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >>>> not be initialized (Unable to load >>>> com.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >>>> org.apache.accumulo.core.iterators.OptionDescriber; configure with 'config' >>>> instead) >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi < >>>> billie.rinaldi@gmail.com> wrote: >>>> >>>>> Is there a typo in the package name? One place says "com" and the >>>>> other "org". >>>>> >>>>> >>>>> On Wed, Nov 6, 2013 at 12:37 PM, Terry P. wrote: >>>>> >>>>>> Hi William, many thanks for the explanation of scan time versus >>>>>> compaction time. I'll look through the classes again and note where the >>>>>> remove versus suppress wordings are used and open a ticket. >>>>>> >>>>>> As mentioned, I only dabble in java, but regardless of that fact at >>>>>> this point I'm the one that has to get this done. I've hobbled together my >>>>>> first attempt, but I get the following error where I try to add it as a >>>>>> scan iterator for testing: >>>>>> >>>>>> root@meta> setiter -class >>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p >>>>>> 20 -scan -t itertest >>>>>> 2013-11-06 14:06:34,914 [shell.Shell] ERROR: >>>>>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could >>>>>> not be initialized (Servers are unable to load >>>>>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type >>>>>> org.apache.accumulo.core.iterators.SortedKeyValueIterator) >>>>>> >>>>>> Here's my source. Note that the value stored in the expTs ColFam is >>>>>> in the format "yyyyMMddHHmmssS", which I convert to a long for a direct >>>>>> comparison to System.currentTimeMillis(). I only overrode the init and >>>>>> acceptRow methods, hoping the others would work as-is from the base class. >>>>>> >>>>>> One clarification: turns out expTs is the ColumnFamily, and the >>>>>> ingest app does not assign a ColumnQualifier for expTs. So to amend my >>>>>> prior table layout (including the datetime format): >>>>>> >>>>>> >>>>>> Format: Key:CF:CQ:Value >>>>>> abc:data:title:"My fantastic data" >>>>>> abc:data:content: >>>>>> abc:creTs::20130804171412445 >>>>>> abc:*expTs*::20131104171412445 >>>>>> ... 6-8 more columns of data per row ... >>>>>> >>>>>> where *expTs* is the ColumnFamily to determine if the entire row >>>>>> should be removed based on whether its value is <= NOW. If a row has not >>>>>> yet been assigned an expiration date, expTs will not be set and the >>>>>> ColumnFamily will not yet be present. Seems like an odd choice to use >>>>>> distinct Column Families, without Column Qualifiers, but that's how the >>>>>> ingest app was done. >>>>>> >>>>>> I greatly appreciate any advice you can provide. >>>>>> >>>>>> package com.esa.accumulo.iterators; >>>>>> >>>>>> import java.io.IOException; >>>>>> import java.text.ParseException; >>>>>> import java.text.SimpleDateFormat; >>>>>> import java.util.Date; >>>>>> import java.util.Map; >>>>>> >>>>>> import org.apache.accumulo.core.data.Key; >>>>>> import org.apache.accumulo.core.data.Value; >>>>>> import org.apache.accumulo.core.iterators.IteratorEnvironment; >>>>>> import org.apache.accumulo.core.iterators.SortedKeyValueIterator; >>>>>> import org.apache.accumulo.core.iterators.user.RowFilter; >>>>>> >>>>>> /** >>>>>> * A filter that removes rows based on the column designated as the >>>>>> "expiration timestamp" column family. >>>>>> * >>>>>> * It removes the row if the value in the expirationTimestamp column >>>>>> is less than currentTime. >>>>>> * >>>>>> * TODO: The designation of the expirationTimestamp ColumnFamily and >>>>>> its DateFormat is >>>>>> * set in the iterator options when the iterator is applied to the >>>>>> table. (For >>>>>> * now it is hardcoded to match the format used in the Solr-Accumulo >>>>>> plugin) >>>>>> */ >>>>>> public class ExpirationTimestampPurgeFilter extends RowFilter { >>>>>> private long currentTime; >>>>>> // TODO: make accumuloDateFormat settable via Iterator Options >>>>>> // Date Format for Expiration Timestamp ColumnFamily stored in >>>>>> Accumulo >>>>>> private String expTsDateFormat = "yyyyMMddHHmmssS"; >>>>>> SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat); >>>>>> >>>>>> // TODO: make expTs settable via Iterator Options >>>>>> // ColumnFamily containing Expiration Timestamp value (note ingest >>>>>> app >>>>>> // did NOT assign a ColumnQualifier, only a ColumnFamily) >>>>>> private String expTsColFam = "expTs"; >>>>>> >>>>>> @Override >>>>>> public boolean acceptRow(SortedKeyValueIterator >>>>>> rowIterator) >>>>>> throws IOException { >>>>>> >>>>>> if >>>>>> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) { >>>>>> Date expTsDate = null; >>>>>> try { >>>>>> expTsDate = df.parse(rowIterator.getTopValue().toString()); >>>>>> if (expTsDate.getTime() < currentTime) >>>>>> return false; >>>>>> } catch (ParseException e) { >>>>>> // TODO Auto-generated catch block >>>>>> e.printStackTrace(); >>>>>> } >>>>>> } >>>>>> return true; >>>>>> } >>>>>> >>>>>> @Override >>>>>> public void init(SortedKeyValueIterator source, >>>>>> Map options, IteratorEnvironment env) throws >>>>>> IOException { >>>>>> super.init(source, options, env); >>>>>> currentTime = System.currentTimeMillis(); >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Nov 5, 2013 at 8:48 PM, William Slacum < >>>>>> wilhelm.von.cloud@accumulo.net> wrote: >>>>>> >>>>>>> If an iterator is only set at scan time, then its logic will only be >>>>>>> applied when a client scans the table. The data will persist through major >>>>>>> and minor compaction and be visible if you scanned the RFile(s) backing the >>>>>>> table. "Suppress" is the better word in this case. Would you please open a >>>>>>> ticket pointing us where to update the documentation? >>>>>>> >>>>>>> It looks like you'd want to implement a RowFilter for your use case. >>>>>>> It has the necessary hooks to avoid reading a whole row into memory and >>>>>>> handling the logic of determining whether or not to write keys that occur >>>>>>> before the column you're filtering on (at the cost of reading those keys >>>>>>> twice). >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 5, 2013 at 6:20 PM, Terry P. wrote: >>>>>>> >>>>>>>> Greetings everyone, >>>>>>>> I'm looking at the AgeOffFilter as a base from which to write a >>>>>>>> server-side filter / iterator to purge rows when they have aged off based >>>>>>>> on the value of a specific column in the row (expiry datetime <= now). So >>>>>>>> this differs from the AgeOffFilter in that the criterion for removal is >>>>>>>> from the same column in every row (not the Accumulo timestamp for an >>>>>>>> individual entry), and we need to remove the entire row not just individual >>>>>>>> entries. For example: >>>>>>>> >>>>>>>> Format: Key:CF:CQ:Value >>>>>>>> abc:data:title:"My fantastic data" >>>>>>>> abc:data:content: >>>>>>>> abc:data:creTs:2013-08-04T17:14:12Z >>>>>>>> abc:data:*expTs*:2013-11-04T17:14:12Z >>>>>>>> ... 6-8 more columns of data per row ... >>>>>>>> >>>>>>>> where *expTs* is the column to determine if the entire row should >>>>>>>> be removed based on whether its value is <= NOW. >>>>>>>> >>>>>>>> This task seemed easy enough as a client program (and it is >>>>>>>> really), but a server-side iterator would be far more efficient than >>>>>>>> sending millions of rowkeys across the network just to delete them (we'll >>>>>>>> be deleting more than a million every hour). But I'm struggling to get >>>>>>>> there. >>>>>>>> >>>>>>>> In looking at AgeOffFilter.java, is the "magic" in the AgeOffFilter >>>>>>>> class that removes (deletes) an entry from a table the fact that the accept >>>>>>>> method returns false, combined with the fact that the iterator would be set >>>>>>>> to run at -majc or -minc time and it is the compaction code that actually >>>>>>>> deletes the entry? If set to run only at scan time, would AgeOffFilter >>>>>>>> simply not return the rows during the scan, but not delete them? The >>>>>>>> wording in the iterator classes varies, some saying "remove" others say >>>>>>>> "suppress" so it's not clear to me >>>>>>>> >>>>>>>> If that's the case, then I think I know where to implement the >>>>>>>> logic. The question is, how can I remove all the entries for the row once >>>>>>>> the accept method has determined it meets the criteria? >>>>>>>> >>>>>>>> Or as Mike Drob mentioned in a prior post, will basing my class on >>>>>>>> the RowFilter class instead of just Filter make things easier? Or the >>>>>>>> WholeRowIterator? Just trying to find the simplest solution. >>>>>>>> >>>>>>>> Sorry for what may be obvious questions but I'm more of a DB >>>>>>>> Architect that does some coding, and not a Java programmer by trade. With >>>>>>>> all of the amazing things Accumulo does, honestly I was surprised when I >>>>>>>> couldn't find a way to delete rows in the shell by criteria other than the >>>>>>>> rowkey! I'm more used to having a shell to 'delete from *table *where >>>>>>>> *column *<= *value*'. >>>>>>>> >>>>>>>> But looking at it now, everyone's criteria for deletion will likely >>>>>>>> be different given the flexibility of a key=>value store. If our rowkey >>>>>>>> had the date/timestamp as a prefix, I know an easy deletemany command in >>>>>>>> the shell would do the trick -- but the nature of the data is such that >>>>>>>> initially no expiration timestamp is set, and there is no means to update >>>>>>>> the key from the client app when expiration timestamp finally gets set (too >>>>>>>> much rework on that common tool I'm afraid). >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --089e0160c5444bde1d04ea8d0659 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Billie,
Adding the "implements = OptionDescriber" is what was needed to allow the iterator to be added = in the shell with the setiter command.

MANY thanks for yo= ur help!=C2=A0 A quick scan test shows it's working as a scan iterator,= though I'll be doing much more thorough testing tomorrow.=C2=A0 Thank = you thank you!



On Wed, Nov 6, 2013 at 6:56 PM, Billie Rinaldi <billie.rinaldi@g= mail.com> wrote:
Making your class &quo= t;extends RowFilter implements OptionDescriber" should be fine.=C2=A0 = One reason it might have been complaining about the @Override annotations i= s if the Java compiler is set to 1.5 compatibility rather than 1.6.

Regarding getting the same error, did you replace all the jars co= ntaining your iterator on all the nodes?=C2=A0 If you did, perhaps it's= not reloading the jars properly.=C2=A0 You could restart accumulo to make = sure it's using the fresh jar, or you could try renaming your class and= dropping it in with a different jar name to ensure the new code is being p= icked up.


On Wed, Nov 6, 20= 13 at 2:50 PM, Terry P. <texpilot@gmail.com> wrote:
Hi Billie,
Many thanks for your help.=C2=A0 I added those two methods, but had to r= emove the @Override as the RowFilter class I'm extending from doesn'= ;t implement them.=C2=A0 Even with these methods in place, I still get the = same error trying to add the iterator in the shell.

I notice that the RowFilter class extends WrappingIterator, which= also doesn't appear to have the describeOptions and validateOptions me= thods ... should I try extending from just the Filter class?=C2=A0 I didn&#= 39;t understand the benefits William listed of extending from the RowFilter= class.=C2=A0 I just know that once I identify a RowKey should be purged ba= sed on its expTs ColFam Value, I want to remove all entries for that RowKey= .


On Wed, Nov 6, 2013 at 3:29 PM, Billie Rinaldi <billie.rinaldi@g= mail.com> wrote:
To use setiter in the shell= , your iterator must implement OptionDescriber.=C2=A0 It has two methods, a= nd something like the following should work for your iterator.=C2=A0 If you= implement passing options to the iterator, you'll want to change the n= ull parameters to the constructor of IteratorOptions below, and probably al= so to do some validation in validateOptions.

=C2=A0 @Override
=C2=A0 public IteratorOptions describeOptions() {=C2=A0=C2=A0=C2=A0 return new IteratorOptions("expTs", "Re= moves rows based on the column designated as the expiration timestamp colum= n family", null, null);
=C2=A0 }

=C2=A0 @Override
=C2=A0 public boolean validateOptions(M= ap<String,String> options) {
=C2=A0=C2=A0=C2=A0 return true;
= =C2=A0 }



On Wed, Nov 6, 2013 at 12:49 PM, Terry P. <texpilot@gmail.com> wrote:
Eyes of an eagle Billi= e!=C2=A0 com is correct, but after viewing "org.apache.accumulo" = so many times, my brain was stuck on org and I goofed in my setiter syntax.=

With THAT corrected, here is the new error:

root@meta> setiter -class com.esa.accumulo.iterators.ExpirationTimes= tampPurgeFilter -n expTsFilter -p 20 -scan -t itertest
2013-11-06 14:46:= 28,280 [shell.Shell] ERROR: org.apache.accumulo.core.util.shell.ShellComman= dException: Command could not be initialized (Unable to load com.esa.accumu= lo.iterators.ExpirationTimestampPurgeFilter as type org.apache.accumulo.cor= e.iterators.OptionDescriber; configure with 'config' instead)





On Wed, Nov 6, 2013 at 2:43 PM, Billie Rinaldi <= billie.rinaldi@gmail.com> wrote:
Is there a typo in the pack= age name?=C2=A0 One place says "com" and the other "org"= ;.


On = Wed, Nov 6, 2013 at 12:37 PM, Terry P. <texpilot@gmail.com>= wrote:
Hi William, = many thanks for the explanation of scan time versus compaction time. I'= ll look through the classes again and note where the remove versus suppress= wordings are used and open a ticket.

As mentioned, I only dabble in java, but regardless of that = fact at this point I'm the one that has to get this done. I've hobb= led together my first attempt, but I get the following error where I try to= add it as a scan iterator for testing:

root@meta> setiter= -class org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsF= ilter -p 20 -scan -t itertest
2013-11-06 14:06:34,914 [shell.Shell] ERRO= R: org.apache.accumulo.core.util.shell.ShellCommandException: Command could= not be initialized (Servers are unable to load org.esa.accumulo.iterators.= ExpirationTimestampPurgeFilter as type org.apache.accumulo.core.iterators.S= ortedKeyValueIterator)


Here's my source.=C2=A0 Note that the value stored in the exp= Ts ColFam is in the format "yyyyMMddHHmmssS", which I convert to = a long for a direct comparison to System.currentTimeMillis(). I only overro= de the init and acceptRow methods, hoping the others would work as-is from = the base class.

One clarification: turns out expTs is the ColumnFamily, and the i= ngest app does not assign a ColumnQualifier for expTs. So to amend my prior= table layout (including the datetime format):


Format: Key:CF:CQ:Value
abc:data:title:"My fantastic data"
abc:d= ata:content:<bytedata>
abc:creTs::20130804171412445
abc:<= b>expTs::20131104171412445
... 6-8 more columns of data per row ...

where expTs is the ColumnFamily to determine if the entire row= should be removed based on whether its value is <=3D NOW.=C2=A0 If a ro= w has not yet been assigned an expiration date, expTs will not be set and t= he ColumnFamily will not yet be present.=C2=A0 Seems like an odd choice to = use distinct Column Families, without Column Qualifiers, but that's how= the ingest app was done.

I greatly appreciate any advice you can provide.

package com.esa.accumulo.iter= ators;

import java.io.IOException;
import java.text.ParseExceptio= n;
import java.text.SimpleDateFormat;
import java.util.Date;
import java= .util.Map;

import org.apache.accumulo.core.data.Key;
import org.a= pache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterator= s.IteratorEnvironment;
import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
import= org.apache.accumulo.core.iterators.user.RowFilter;

/**
=C2=A0* A= filter that removes rows based on the column designated as the "expir= ation timestamp" column family.
=C2=A0*
=C2=A0* It removes the row if the value in the expirationTimest= amp column is less than currentTime.
=C2=A0*
=C2=A0* TODO: The desig= nation of the expirationTimestamp ColumnFamily and its DateFormat is
=C2= =A0* set in the iterator options when the iterator is applied to the table.= (For
=C2=A0* now it is hardcoded to match the format used in the Solr-Accumulo p= lugin)
=C2=A0*/
public class ExpirationTimestampPurgeFilter extends R= owFilter {
=C2=A0 private long currentTime;
=C2=A0 // TODO: make accu= muloDateFormat settable via Iterator Options
=C2=A0 // Date Format for Expiration Timestamp ColumnFamily stored in Accum= ulo
=C2=A0 private String expTsDateFormat =3D "yyyyMMddHHmmssS"= ;;
=C2=A0 SimpleDateFormat df =3D new SimpleDateFormat(expTsDateFormat);=

=C2=A0 // TODO: make expTs settable via Iterator Options
=C2=A0 // ColumnFamily containing Expiration Timestamp value (note ingest a= pp
=C2=A0 // did NOT assign a ColumnQualifier, only a ColumnFamily)
= =C2=A0 private String expTsColFam =3D "expTs";

=C2=A0 @Ove= rride
=C2=A0 public boolean acceptRow(SortedKeyValueIterator<Key, Val= ue> rowIterator)
=C2=A0=C2=A0=C2=A0 throws IOException {

=C2=A0=C2=A0=C2=A0 if (rowIt= erator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {
= =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 Date expTsDate =3D null;
=C2=A0=C2=A0 = =C2=A0=C2=A0=C2=A0 try {
=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 exp= TsDate =3D df.parse(rowIterator.getTopValue().toString());
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (expTsDate.getTime() = < currentTime)
=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 return false;
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } catch (ParseEx= ception e) {
=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 // TODO Auto-ge= nerated catch block
=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 e.printS= tackTrace();
=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0 return true;
=C2=A0 }

=C2=A0 @Override
=C2= =A0 public void init(SortedKeyValueIterator<Key, Value> source,
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Map<String, String> options, IteratorE= nvironment env) throws IOException {
=C2=A0=C2=A0=C2=A0 super.init(sourc= e, options, env);
=C2=A0=C2=A0=C2=A0 currentTime =3D System.currentTimeMillis();
=C2=A0 }<= br>
}



On Tue, Nov 5, 2013 at 8:48 PM, William Slacum <wilhelm.von.cloud@accumulo.net> wrote:
If an it= erator is only set at scan time, then its logic will only be applied when a= client scans the table. The data will persist through major and minor comp= action and be visible if you scanned the RFile(s) backing the table. "= Suppress" is the better word in this case. Would you please open a tic= ket pointing us where to update the documentation?

It looks like you'd want to implement a RowFilter for yo= ur use case. It has the necessary hooks to avoid reading a whole row into m= emory and handling the logic of determining whether or not to write keys th= at occur before the column you're filtering on (at the cost of reading = those keys twice).


On Tue, Nov 5, 2013 at 6:20 PM, Terry P. <= texpilot@gmail.com> wrote:
Greetings everyone,
I'm looking at the AgeOffFilter as = a base from which to write a server-side filter / iterator to purge rows wh= en they have aged off based on the value of a specific column in the row (e= xpiry datetime <=3D now). So this differs from the AgeOffFilter in that = the criterion for removal is from the same column in every row (not the Acc= umulo timestamp for an individual entry), and we need to remove the entire = row not just individual entries. For example:

Format: Key:CF:CQ:Value
abc:data:title:"M= y fantastic data"
abc:data:content:<bytedata>
abc:data:cre= Ts:2013-08-04T17:14:12Z
abc:data:expTs:2013-11-04T17:14:12Z
... 6-8 more columns of data per row ...

where= expTs is the column to determine if the entire row should be remove= d based on whether its value is <=3D NOW.

This task se= emed easy enough as a client program (and it is really), but a server-side = iterator would be far more efficient than sending millions of rowkeys acros= s the network just to delete them (we'll be deleting more than a millio= n every hour).=C2=A0 But I'm struggling to get there.

In looking at AgeOffFilter.java, is the "magic" in the = AgeOffFilter class that removes (deletes) an entry from a table the fact th= at the accept method returns false, combined with the fact that the iterato= r would be set to run at -majc or -minc time and it is the compaction code = that actually deletes the entry?=C2=A0 If set to run only at scan time, wou= ld AgeOffFilter simply not return the rows during the scan, but not delete = them?=C2=A0 The wording in the iterator classes varies, some saying "r= emove" others say "suppress" so it's not clear to me

If that's the case, then I think I know where to implement the logi= c. The question is, how can I remove all the entries for the row once the a= ccept method has determined it meets the criteria?

Or as = Mike Drob mentioned in a prior post, will basing my class on the RowFilter = class instead of just Filter make things easier?=C2=A0 Or the WholeRowItera= tor?=C2=A0 Just trying to find the simplest solution.

Sorry for what may be obvious questions but I'= ;m more of a DB Architect that does some coding, and not a Java programmer = by trade. With all of the amazing things Accumulo does, honestly I was surp= rised when I couldn't find a way to delete rows in the shell by criteri= a other than the rowkey!=C2=A0 I'm more used to having a shell to '= delete from table where column <=3D value'.=C2= =A0

But looking at it now, everyone's criteria for deletion will likely= be different given the flexibility of a key=3D>value store.=C2=A0 If ou= r rowkey had the date/timestamp as a prefix, I know an easy deletemany comm= and in the shell would do the trick -- but the nature of the data is such t= hat initially no expiration timestamp is set, and there is no means to upda= te the key from the client app when expiration timestamp finally gets set (= too much rework on that common tool I'm afraid).

Thanks in advance.








--089e0160c5444bde1d04ea8d0659--