accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David O'Gwynn" <dogw...@acm.org>
Subject Re: Thrift proxy: Python WholeRowIterator behavior
Date Mon, 14 Apr 2014 01:13:07 GMT
Ok, so I went back to my IPython console to rerun my scan to prove to
myself that I wasn't crazy. Well, I ran it and it worked like you just
said, contra to my original point. Started to think I was on the crazy
train.

Then I remembered that the table I'd been working on, I'd removed the
versioning iterator for some other tests. Then I started checking the
priorities of my iterators. Turns out, the issue was the priority of
my WRI.

If the versioning iterator is attached, and the WRI's priority is <=
the versioning iterator's priority, then you see this behavior (the
first row of a WRI scan gets dropped). If you change the priority for
the WRI in your code to <=20, then you'll see it, Josh.

Still not sure why this would be the case; seems an odd behavior.
Anyway, thanks for taking the time to help me suss this out. :-)

On Sun, Apr 13, 2014 at 8:24 PM, Josh Elser <josh.elser@gmail.com> wrote:
> David,
>
> Not quite sure what you're seeing. Using the "plain" python bindings, I
> think I emulated what you described. I created a table with the following
> data:
>
> 1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 'col3:
> [] 1397241800 => val3']
> 2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 'col3:
> [] 1397241808 => val3']
>
> I then modified the start and end Key (really just row) for the Range with
> the following code:
>
> https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py
>
> I got the results I would expect (just row1, just row2, and both row1 and
> row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- HTH if
> you have more info.
>
>
> On 4/13/14, 5:02 PM, David O'Gwynn wrote:
>>
>> Hi Russ,
>>
>> I ported it:
>>
>> def decode_row(cell):
>>      value = StringIO.StringIO(cell.value)
>>      numCells = struct.unpack('!i',value.read(4))[0]
>>      key = cell.row
>>      for i in range(numCells):
>>          if value.pos == value.len:
>>              raise Exception(
>>                  'Reached the end of the parsable string without'
>>                  ' having finished unpacking. Likely an error'
>>                  ' of passing a cell that is not from a'
>>                  ' WholeRowIterator.'
>>                  )
>>          cf = value.read(struct.unpack('!i',value.read(4))[0])
>>          cq = value.read(struct.unpack('!i',value.read(4))[0])
>>          cv = value.read(struct.unpack('!i',value.read(4))[0])
>>          cts = struct.unpack('!q',value.read(8))[0]/1000.
>>          val = value.read(struct.unpack('!i',value.read(4))[0])
>>
>> You'll want the check at the beginning of the for loop; I found out
>> how fast Python can fill my available memory before I put that in.
>>
>> On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rweeks@newbrightidea.com>
>> wrote:
>>>
>>> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
>>> over to Python, or is that functionality available somewhere in the
>>> pyaccumulo API and I just missed it?
>>>
>>> -Russ
>>>
>>>
>>> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <dogwynn@acm.org> wrote:
>>>>
>>>>
>>>> 1.5.0
>>>>
>>>> Btw, the pyaccumulo library:
>>>>
>>>> https://github.com/accumulo/pyaccumulo
>>>>
>>>> is the basis of my codebase. You should be able to use that to
>>>> replicate the issue.
>>>>
>>>> Thanks for looking into this!
>>>>
>>>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <josh.elser@gmail.com>
>>>> wrote:
>>>>>
>>>>> Ah, gotcha.
>>>>>
>>>>> That definitely does not seem right. I'll see if I can poke around at
>>>>> this
>>>>> today.
>>>>>
>>>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>>>>>
>>>>>
>>>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>>>>>>
>>>>>>
>>>>>> Hi Josh,
>>>>>>
>>>>>> I guess I misspoke, the Range I'm passing is this:
>>>>>>
>>>>>> Range('row0', true, 'row0\0',true)
>>>>>>
>>>>>> Keeping in mind that the Thrift interface only exposes one Range
>>>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing
is
>>>>>> this:
>>>>>>
>>>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>>>>>>
>>>>>> If I scan for all entries (without WholeRowIterator), I get the full
>>>>>> contents of "row0". However, when I add the WholeRowIterator, it
>>>>>> returns nothing.
>>>>>>
>>>>>> Furthermore, if I were to pass the following:
>>>>>>
>>>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>>>>>>
>>>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>>>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>>>>>> interpreting my Range as having startKeyInclusive set to false, which
>>>>>> is clearly not the case.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <josh.elser@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Looks like you're just mis-using the Range here.
>>>>>>>
>>>>>>> If you create a range that is ["row0", "row0"] as you denote
below,
>>>>>>> that
>>>>>>> will only include Keys that have a rowId of "row0" with an empty
>>>>>>> colfam,
>>>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can
>>>>>>> assume
>>>>>>> you
>>>>>>> want all columns in "row0". As such, ["row0", "row0\0") would
be the
>>>>>>> best
>>>>>>> range to fetch all of the columns in that single row.
>>>>>>>
>>>>>>>
>>>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'm working with the Python Thrift API for the Accumulo proxy
>>>>>>>> service,
>>>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo
1.5
>>>>>>>> (the standard one from the Accumulo website).
>>>>>>>>
>>>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>>>>>>>> configure
>>>>>>>> the Range for that Scanner to correctly return the start
row for the
>>>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull
a singe
>>>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>>>>>>> it returns only "row1".
>>>>>>>>
>>>>>>>>    From the WholeRowIterator documentation, this behavior
implies
>>>>>>>> that
>>>>>>>> the startInclusive bit was set to False, which it clearly
wasn't.
>>>>>>>>
>>>>>>>> I've been able to hack around this issue by setting the start
key to
>>>>>>>>
>>>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>>>>>>
>>>>>>>> but I'd really rather understand the correct way of using
a Range
>>>>>>>> object in conjunction with a WholeRowIterator.
>>>>>>>>
>>>>>>>> Thanks much,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>>
>

Mime
View raw message