accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: How to remove entire row at the server side?
Date Thu, 07 Nov 2013 21:16:36 GMT
On Thu, Nov 7, 2013 at 3:49 PM, Terry P. <texpilot@gmail.com> wrote:

> Hi Keith,
> No, expTs won't be the first actually -- that'll teach me to try things
> with overly simplistic data!
>

>  There will be 10-12 column families for each row. I take it my simple
> check for column family name isn't enough?
>

You can iterate until you see the column or seek to it.   If you expect
there will always be a small of data before the column occurs, then iterate.


>
>
> On Thursday, November 7, 2013, Keith Turner wrote:
>
>> Your accept row function assumes that expTs will be the first column in
>> the row, is this always the case?
>>
>>
>> On Wed, Nov 6, 2013 at 3:37 PM, Terry P. <texpilot@gmail.com> wrote:
>>
>> Hi William, many thanks for the explanation of scan time versus
>> compaction time. I'll look through the classes again and note where the
>> remove versus suppress wordings are used and open a ticket.
>>
>> As mentioned, I only dabble in java, but regardless of that fact at this
>> point I'm the one that has to get this done. I've hobbled together my first
>> attempt, but I get the following error where I try to add it as a scan
>> iterator for testing:
>>
>> root@meta> setiter -class
>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter -n expTsFilter -p
>> 20 -scan -t itertest
>> 2013-11-06 14:06:34,914 [shell.Shell] ERROR:
>> org.apache.accumulo.core.util.shell.ShellCommandException: Command could
>> not be initialized (Servers are unable to load
>> org.esa.accumulo.iterators.ExpirationTimestampPurgeFilter as type
>> org.apache.accumulo.core.iterators.SortedKeyValueIterator)
>>
>> Here's my source.  Note that the value stored in the expTs ColFam is in
>> the format "yyyyMMddHHmmssS", which I convert to a long for a direct
>> comparison to System.currentTimeMillis(). I only overrode the init and
>> acceptRow methods, hoping the others would work as-is from the base class.
>>
>> One clarification: turns out expTs is the ColumnFamily, and the ingest
>> app does not assign a ColumnQualifier for expTs. So to amend my prior table
>> layout (including the datetime format):
>>
>>
>> Format: Key:CF:CQ:Value
>> abc:data:title:"My fantastic data"
>> abc:data:content:<bytedata>
>> abc:creTs::20130804171412445
>> abc:*expTs*::20131104171412445
>> ... 6-8 more columns of data per row ...
>>
>> where *expTs* is the ColumnFamily to determine if the entire row should
>> be removed based on whether its value is <= NOW.  If a row has not yet been
>> assigned an expiration date, expTs will not be set and the ColumnFamily
>> will not yet be present.  Seems like an odd choice to use distinct Column
>> Families, without Column Qualifiers, but that's how the ingest app was done.
>>
>> I greatly appreciate any advice you can provide.
>>
>> package com.esa.accumulo.iterators;
>>
>> import java.io.IOException;
>> import java.text.ParseException;
>> import java.text.SimpleDateFormat;
>> import java.util.Date;
>> import java.util.Map;
>>
>> import org.apache.accumulo.core.data.Key;
>> import org.apache.accumulo.core.data.Value;
>> import org.apache.accumulo.core.iterators.IteratorEnvironment;
>> import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
>> import org.apache.accumulo.core.iterators.user.RowFilter;
>>
>> /**
>>  * A filter that removes rows based on the column designated as the
>> "expiration timestamp" column family.
>>  *
>>  * It removes the row if the value in the expirationTimestamp column is
>> less than currentTime.
>>  *
>>  * TODO: The designation of the expirationTimestamp ColumnFamily and its
>> DateFormat is
>>  * set in the iterator options when the iterator is applied to the table.
>> (For
>>  * now it is hardcoded to match the format used in the Solr-Accumulo
>> plugin)
>>  */
>> public class ExpirationTimestampPurgeFilter extends RowFilter {
>>   private long currentTime;
>>   // TODO: make accumuloDateFormat settable via Iterator Options
>>   // Date Format for Expiration Timestamp ColumnFamily stored in Accumulo
>>   private String expTsDateFormat = "yyyyMMddHHmmssS";
>>   SimpleDateFormat df = new SimpleDateFormat(expTsDateFormat);
>>
>>   // TODO: make expTs settable via Iterator Options
>>   // ColumnFamily containing Expiration Timestamp value (note ingest app
>>   // did NOT assign a ColumnQualifier, only a ColumnFamily)
>>   private String expTsColFam = "expTs";
>>
>>   @Override
>>   public boolean acceptRow(SortedKeyValueIterator<Key, Value> rowIterator)
>>     throws IOException {
>>
>>     if
>> (rowIterator.getTopKey().getColumnFamily().toString().equals(expTsColFam)) {
>>        Date expTsDate = null;
>>        try {
>>          expTsDate = df.parse(rowIterator.getTopValue().toString());
>>            if (expTsDate.getTime() < currentTime)
>>              return false;
>>        } catch (ParseException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>     }
>>     return true;
>>   }
>>
>>   @Override
>>   public void init(SortedKeyValueIterator<Key, Value> source,
>>       Map<String, Str
>>
>>

Mime
View raw message