accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Loss <bfl...@praxiseng.com>
Subject Re: Modify Keys within iterator
Date Fri, 30 Sep 2016 17:51:54 GMT
Most likely, you see no data because your output keys don’t fall within the seek range provided
to your scanner. Behind the scenes Accumulo will throw away any output keys that don’t fall
within the seek range.

In general, you should not attempt to re-write the row in an iterator. If you ever have more
than one tablet, you can get into the case where it is simply not possible—if your transformed
rows belong in a different tablet. You can transform other parts of the key, but there are
memory considerations to keep in mind. Keys must be returned from the iterator in sorted order,
so if you are going to transform keys, you have to buffer up enough to guarantee that you
can return the transformed keys in sorted order. If you are modifying the column family, for
example, then you have to buffer all keys with the same row as the one you are transforming.
That’s fine if you know your data well enough to know that you won’t blow out memory on
the tablet server. However, it is very easy to get into trouble. Have a look at TransformingIterator
in the source. It attempts to take care of all of these issues for you (though you still need
to be aware that it is going to buffer keys and you could easily cause an OOM on your tablet
server if you aren’t careful).

On Sep 30, 2016, at 1:36 PM, Yamini Joshi <yamini.1691@gmail.com<mailto:yamini.1691@gmail.com>>
wrote:

I am using pyaccumulo. Here's the code snippet:

rowIds=['r2','r10']

hashFilter = KeyModifyIterator(priority=10)
iterator.append(hashFilter)

for entry in self.dbconn.batch_scan(table , scanranges=(Range(srow=row, erow=row) for row
in rowIDs),iterators=[hashFilter]):
    print entry


Best regards,
Yamini Joshi

On Fri, Sep 30, 2016 at 12:31 PM, Dan Blum <dblum@bbn.com<mailto:dblum@bbn.com>>
wrote:
What code are you using to test the iterator, where you see no output?

From: Yamini Joshi [mailto:yamini.1691@gmail.com<mailto:yamini.1691@gmail.com>]
Sent: Friday, September 30, 2016 1:26 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: Modify Keys within iterator

Hello Everyone!
I am trying to write an iterator to modify keys within a table (at scan). My use case is to
select a few records that match a certain criterion and then modify them within the iterator(using
the following class) for some other succeeding iterator/combiner. The problem is that this
iterator does return any records/keys. I added some primitive prints and found that the keys
(this.key) is changed but the output of iterator is nothing. I'd appreciate if someone could
give me any insight. I'm sure I'm making a teeny tiny mistake somewhere.
Schema:
         row                   colF                       colQ                 ts        
       Val
I/P:    r_1                   f                             f_1                          
           v1
         r_1                   fx                            f_1                         
            v1
O/P:  f_1                   r                             r_1                            
         v1
         f_1                   fx                            fx                          
             v1



public class KeyModifyIterator implements SortedKeyValueIterator<Key,Value> {

  private SortedKeyValueIterator<Key,Value> source;
  private Key key;
  private Value value;

  @Override
  public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String>
options, IteratorEnvironment env) throws IOException {
    this.source = source;
  }

  @Override
  public boolean hasTop() {
    return key != null;
  }

  @Override
  public void next() throws IOException {
        if (source.hasTop()) {
            ByteSequence currentRow = source.getTopKey().getRowData();
            ByteSequence currentColf = source.getTopKey().getColumnFamilyData();
            ByteSequence currentColq = source.getTopKey().getColumnQualifierData();
            long ts = source.getTopKey().getTimestamp();
            String v = source.getTopValue().toString();
            System.out.println("Key = " + currentRow.toString() + " Cf = " + currentColf.toString()
+ " Cq = " + currentColq.toString()  + " val = " + v.toString());

            if (currentColf.toString().equals("fx")){
                System.out.println("Updating fx" );
                this.key = new Key(currentColq.toArray(), currentColf.toArray(), currentColf.toArray(),
new byte[0], ts);
                this.value = new Value (v.getBytes(UTF_8));
            }
            else{
                System.out.println("Updating other" );
                this.key = new Key(currentColq.toArray(), "r".getBytes(UTF_8), currentRow.toArray(),
new byte[0], ts);
                this.value = new Value (v.getBytes(UTF_8));
                System.out.println(this.key.toString());
            }



            source.next();


          } else {
            this.key = null;
            this.value = null;
      }
  }

  @Override
  public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
throws IOException {
    source.seek(range, columnFamilies, inclusive);
    next();
  }

  @Override
  public Key getTopKey() {
    return key;
  }

  @Override
  public Value getTopValue() {
    return value;
  }

  @Override
  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    return null;
  }


}


Best regards,
Yamini Joshi


Mime
View raw message