Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (athena.apache.org: domain of josh.elser@gmail.com
 designates 209.85.192.53 as permitted sender)
Message-ID: <534B2AD5.9060309@gmail.com>
Date: Sun, 13 Apr 2014 20:24:53 -0400
From: Josh Elser <josh.elser@gmail.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: user@accumulo.apache.org
Subject: Re: Thrift proxy: Python WholeRowIterator behavior
References: 
 <CABx0JHbu2fqJmDKqhsHsE8K0HpYotaCDr8uT=4c9CGi3tq5S8w@mail.gmail.com>
	<53498AC0.70508@gmail.com>
	<CABx0JHZrD-6dtBrjSymSmC4T8-O5Qkh0GTPigJ_VrZik=sCKoQ@mail.gmail.com>
	<534AC076.8000206@gmail.com>
	<CABx0JHarTZw=AxZNHKph5Xi_YM-8O+yEwcXicwV9suQJ=PiEtQ@mail.gmail.com>
	<CAPVnKE=5BzDADs5Jk+Y707g3tRz5E6qMd26i9tjOVYn1ZYo-Ew@mail.gmail.com>
 <CABx0JHZ1kK-dJksanth0cb5vRAEWnDx3CDS38MVLq+OgACWSZA@mail.gmail.com>
In-Reply-To: 
 <CABx0JHZ1kK-dJksanth0cb5vRAEWnDx3CDS38MVLq+OgACWSZA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

David,

Not quite sure what you're seeing. Using the "plain" python bindings, I 
think I emulated what you described. I created a table with the 
following data:

1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 
'col3: [] 1397241800 => val3']
2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 
'col3: [] 1397241808 => val3']

I then modified the start and end Key (really just row) for the Range 
with the following code:

https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py

I got the results I would expect (just row1, just row2, and both row1 
and row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- 
HTH if you have more info.

On 4/13/14, 5:02 PM, David O'Gwynn wrote:
> Hi Russ,
>
> I ported it:
>
> def decode_row(cell):
>      value = StringIO.StringIO(cell.value)
>      numCells = struct.unpack('!i',value.read(4))[0]
>      key = cell.row
>      for i in range(numCells):
>          if value.pos == value.len:
>              raise Exception(
>                  'Reached the end of the parsable string without'
>                  ' having finished unpacking. Likely an error'
>                  ' of passing a cell that is not from a'
>                  ' WholeRowIterator.'
>                  )
>          cf = value.read(struct.unpack('!i',value.read(4))[0])
>          cq = value.read(struct.unpack('!i',value.read(4))[0])
>          cv = value.read(struct.unpack('!i',value.read(4))[0])
>          cts = struct.unpack('!q',value.read(8))[0]/1000.
>          val = value.read(struct.unpack('!i',value.read(4))[0])
>
> You'll want the check at the beginning of the for loop; I found out
> how fast Python can fill my available memory before I put that in.
>
> On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rweeks@newbrightidea.com> wrote:
>> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
>> over to Python, or is that functionality available somewhere in the
>> pyaccumulo API and I just missed it?
>>
>> -Russ
>>
>>
>> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <dogwynn@acm.org> wrote:
>>>
>>> 1.5.0
>>>
>>> Btw, the pyaccumulo library:
>>>
>>> https://github.com/accumulo/pyaccumulo
>>>
>>> is the basis of my codebase. You should be able to use that to
>>> replicate the issue.
>>>
>>> Thanks for looking into this!
>>>
>>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>> Ah, gotcha.
>>>>
>>>> That definitely does not seem right. I'll see if I can poke around at
>>>> this
>>>> today.
>>>>
>>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>>>>
>>>>
>>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>>>>>
>>>>> Hi Josh,
>>>>>
>>>>> I guess I misspoke, the Range I'm passing is this:
>>>>>
>>>>> Range('row0', true, 'row0\0',true)
>>>>>
>>>>> Keeping in mind that the Thrift interface only exposes one Range
>>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>>>>> this:
>>>>>
>>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>>>>>
>>>>> If I scan for all entries (without WholeRowIterator), I get the full
>>>>> contents of "row0". However, when I add the WholeRowIterator, it
>>>>> returns nothing.
>>>>>
>>>>> Furthermore, if I were to pass the following:
>>>>>
>>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>>>>>
>>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>>>>> interpreting my Range as having startKeyInclusive set to false, which
>>>>> is clearly not the case.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <josh.elser@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Looks like you're just mis-using the Range here.
>>>>>>
>>>>>> If you create a range that is ["row0", "row0"] as you denote below,
>>>>>> that
>>>>>> will only include Keys that have a rowId of "row0" with an empty
>>>>>> colfam,
>>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
>>>>>> you
>>>>>> want all columns in "row0". As such, ["row0", "row0\0") would be the
>>>>>> best
>>>>>> range to fetch all of the columns in that single row.
>>>>>>
>>>>>>
>>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'm working with the Python Thrift API for the Accumulo proxy
>>>>>>> service,
>>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>>>>>>> (the standard one from the Accumulo website).
>>>>>>>
>>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>>>>>>> configure
>>>>>>> the Range for that Scanner to correctly return the start row for the
>>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>>>>>>> it returns only "row1".
>>>>>>>
>>>>>>>    From the WholeRowIterator documentation, this behavior implies that
>>>>>>> the startInclusive bit was set to False, which it clearly wasn't.
>>>>>>>
>>>>>>> I've been able to hack around this issue by setting the start key to
>>>>>>>
>>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>>>>>>>
>>>>>>> but I'd really rather understand the correct way of using a Range
>>>>>>> object in conjunction with a WholeRowIterator.
>>>>>>>
>>>>>>> Thanks much,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>
>>
>>