accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David O'Gwynn" <dogw...@acm.org>
Subject Re: Thrift proxy: Python WholeRowIterator behavior
Date Sun, 13 Apr 2014 21:02:11 GMT
Hi Russ,

I ported it:

def decode_row(cell):
    value = StringIO.StringIO(cell.value)
    numCells = struct.unpack('!i',value.read(4))[0]
    key = cell.row
    for i in range(numCells):
        if value.pos == value.len:
            raise Exception(
                'Reached the end of the parsable string without'
                ' having finished unpacking. Likely an error'
                ' of passing a cell that is not from a'
                ' WholeRowIterator.'
                )
        cf = value.read(struct.unpack('!i',value.read(4))[0])
        cq = value.read(struct.unpack('!i',value.read(4))[0])
        cv = value.read(struct.unpack('!i',value.read(4))[0])
        cts = struct.unpack('!q',value.read(8))[0]/1000.
        val = value.read(struct.unpack('!i',value.read(4))[0])

You'll want the check at the beginning of the for loop; I found out
how fast Python can fill my available memory before I put that in.

On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks <rweeks@newbrightidea.com> wrote:
> Just curious, David, did you port the logic of WholeRowIterator.decodeRow
> over to Python, or is that functionality available somewhere in the
> pyaccumulo API and I just missed it?
>
> -Russ
>
>
> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn <dogwynn@acm.org> wrote:
>>
>> 1.5.0
>>
>> Btw, the pyaccumulo library:
>>
>> https://github.com/accumulo/pyaccumulo
>>
>> is the basis of my codebase. You should be able to use that to
>> replicate the issue.
>>
>> Thanks for looking into this!
>>
>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser <josh.elser@gmail.com> wrote:
>> > Ah, gotcha.
>> >
>> > That definitely does not seem right. I'll see if I can poke around at
>> > this
>> > today.
>> >
>> > Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago)
>> >
>> >
>> > On 4/12/14, 4:13 PM, David O'Gwynn wrote:
>> >>
>> >> Hi Josh,
>> >>
>> >> I guess I misspoke, the Range I'm passing is this:
>> >>
>> >> Range('row0', true, 'row0\0',true)
>> >>
>> >> Keeping in mind that the Thrift interface only exposes one Range
>> >> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is
>> >> this:
>> >>
>> >> Range( Key('row0',null,...), true, Key('row0\0',null,...), true )
>> >>
>> >> If I scan for all entries (without WholeRowIterator), I get the full
>> >> contents of "row0". However, when I add the WholeRowIterator, it
>> >> returns nothing.
>> >>
>> >> Furthermore, if I were to pass the following:
>> >>
>> >> Range( Key('row0',null,...), true, Key('row1\0',null,...), true )
>> >>
>> >> not only do I get both "row0" and "row1" without the WRI, I get "row1"
>> >> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow
>> >> interpreting my Range as having startKeyInclusive set to false, which
>> >> is clearly not the case.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >>
>> >> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser <josh.elser@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi David,
>> >>>
>> >>> Looks like you're just mis-using the Range here.
>> >>>
>> >>> If you create a range that is ["row0", "row0"] as you denote below,
>> >>> that
>> >>> will only include Keys that have a rowId of "row0" with an empty
>> >>> colfam,
>> >>> colqual, etc. Since you want to use the WholeRowIterator, I can assume
>> >>> you
>> >>> want all columns in "row0". As such, ["row0", "row0\0") would be the
>> >>> best
>> >>> range to fetch all of the columns in that single row.
>> >>>
>> >>>
>> >>> On 4/12/2014 1:59 PM, David O'Gwynn wrote:
>> >>>>
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> I'm working with the Python Thrift API for the Accumulo proxy
>> >>>> service,
>> >>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5
>> >>>> (the standard one from the Accumulo website).
>> >>>>
>> >>>> Whenever I use the WholeRowIterator with a Scanner, I cannot
>> >>>> configure
>> >>>> the Range for that Scanner to correctly return the start row for
the
>> >>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe
>> >>>> row], it returns zero entries. For Range('row0',true,'row1\0',true),
>> >>>> it returns only "row1".
>> >>>>
>> >>>>   From the WholeRowIterator documentation, this behavior implies
that
>> >>>> the startInclusive bit was set to False, which it clearly wasn't.
>> >>>>
>> >>>> I've been able to hack around this issue by setting the start key
to
>> >>>>
>> >>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False)
>> >>>>
>> >>>> but I'd really rather understand the correct way of using a Range
>> >>>> object in conjunction with a WholeRowIterator.
>> >>>>
>> >>>> Thanks much,
>> >>>>
>> >>>> David
>> >>>>
>> >>>
>> >
>
>

Mime
View raw message