accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Purdy <mpurdy1973usergro...@gmail.com>
Subject Re: scan iterator that rolls up col vis
Date Sat, 09 Aug 2014 01:44:19 GMT
sorry for not getting back.

it seems to me a very simple solutions is modifying the
Combiner.ValueIterator._hasNext() from using
PartialKey.ROW_COLFAM_COLQUAL_COLVIS => PartialKey.ROW_COLFAM_COLQUAL

only problem with this is you lose the CV; however, by adding a new method
to Combiner.ValueIterator getKey() you can build up the CV field for the
set of all CVs with in a matching PartialKey.ROW_COLFAM_COLQUAL

also, Combiner has a public method setPartialKey() you could rollup on any
PartialKey very easily.

//note: i am currently using 1.4.4
private boolean _hasNext() {
      return source.hasTop() && !source.getTopKey().isDeleted() &&
topKey.equals(source.getTopKey(), PartialKey.ROW_COLFAM_COLQUAL);
    }


On Wed, Jul 2, 2014 at 11:27 AM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> you should be able to roll up on keys with a condition similar to:
>
> if( source.hasTop() ) {
>   Key start = new Key(source.getTopKey()); // avoid instance-reuse issues
>   long count = 0;
>   while( source.hasTop() && start.equals( source.getTopKey(),
> PartialKey.ROW_COLFAM_COLQUAL_COLVIS ) {
>     count += deserialize(source.getTopValue());
>     source.next();
>   }
>   Value new_top_value = serialize(count);
>   // start can represent the top key of the iterator
> }
>
> We can flesh this out further if you run into issues. I think that we may
> need to set the start key's timestamp to 0 so that it sorts after all the
> other cells with a similar prefix.
>
>
> On Tue, Jul 1, 2014 at 10:41 PM, Matthew Purdy <
> mpurdy1973usergroups@gmail.com> wrote:
>
>>
>>
>> USE CASE: on scan only; want to have a "summing combiner" that rolls
>> up by (rowId, colfam, colqual) on all row keys where the client has
>> visibility.
>>
>> below is a simple example that expresses the use case.
>>
>> accumulo table holding student to professor relationship by departments
>>
>>
>> +----------+------------------+-----------+--------------+-----+
>> |  rowId   |       colfam     |  colqual  |    colvis    | val |
>> +----------+------------------+-----------+--------------+-----+
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> +----------+------------------+-----------+--------------+-----+
>>
>>
>> with the summing combiner the results would be
>>
>> +----------+------------------+-----------+--------------+-----+
>> |  rowId   |       colfam     |  colqual  |    colvis    | val |
>> +----------+------------------+-----------+--------------+-----+
>> | student1 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   2 |
>> | student1 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   2 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | MATH_DEPT    |   1 |
>> | student2 | TAKES_CLASS_WITH |  prof1    | COM_SCI_DEPT |   1 |
>> +----------+------------------+-----------+--------------+-----+
>>
>> - the math department can only see math department totals
>> - the com sci department can only see the com sci department total
>> - the office of the dean has both access
>>
>> therefore when scanning (it wouldnt work for compaction), how
>> can you sum over colvis?
>>
>> assuming you had both colvis access the desired results would be:
>>
>> +----------+------------------+-----------+-----+
>> |  rowId   |       colfam     |  colqual  | val |
>> +----------+------------------+-----------------+
>> | student1 | TAKES_CLASS_WITH |  prof1    |   4 |
>> | student2 | TAKES_CLASS_WITH |  prof1    |   2 |
>> +----------+------------------+-----------+-----+
>>
>>
>>
>


-- 
Thank You,
Matthew Purdy

------------------------------------------------------------------------------------------------------------------
Matthew Purdy
mpurdy1973userGroups@gmail.com
443.848.1595
--------------------------------------
"Lead, follow, or get out of the way." -- Thomas Paine
"Make everything as simple as possible, but not simpler." -- Albert Einstein
"The definition of insanity is doing the same thing over and over and
expecting a different result." -- Benjamin Franklin
"We can't solve problems by using the same kind of thinking we used when we
created them." -- Albert Einstein
------------------------------------------------------------------------------------------------------------------

Mime
View raw message