Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90CA3105A2 for ; Mon, 14 Apr 2014 00:25:20 +0000 (UTC) Received: (qmail 13789 invoked by uid 500); 14 Apr 2014 00:25:19 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 13747 invoked by uid 500); 14 Apr 2014 00:25:19 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 13732 invoked by uid 99); 14 Apr 2014 00:25:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 00:25:19 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of josh.elser@gmail.com designates 209.85.192.53 as permitted sender) Received: from [209.85.192.53] (HELO mail-qg0-f53.google.com) (209.85.192.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 00:25:14 +0000 Received: by mail-qg0-f53.google.com with SMTP id f51so6878289qge.12 for ; Sun, 13 Apr 2014 17:24:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=g4tXeJQU3Jq01EKc672xPQXkIJFBtSk16qYMb40QY0A=; b=d+R7KGB4yJmBK9i7dPI5JmStVUWjhJDyrzh5xey7f89sC86fc1Q07QQ9gHuabQzBlA cWeCLT2DI74RGvsU6XIqFNZHzeIiIKg/Ve4yNXOKi/g3utBJEyf46oeNItPBZTcpN4+1 KOMuIdasxITDTq9T9MHo7zar5YX9d7J3H8RZMRjhkNgzYYO6HuxJFdZIDY1Kq8IQGY5j OXskLzyY5PDtuzzlzS6Lrvh9kEAtxsIOt2QCHizchWj2SwS1dIcnYApSKnZyGAAZ5iDk 2CDVO+kw314PlcVzyybkHIK/Uuj8l+NeJiLja9Y6ISJX+TqH9gXxl5CasY0YHj8fwfRk 8pBA== X-Received: by 10.140.41.80 with SMTP id y74mr59950qgy.104.1397435094192; Sun, 13 Apr 2014 17:24:54 -0700 (PDT) Received: from HW10447.local (pool-71-166-48-47.bltmmd.fios.verizon.net. [71.166.48.47]) by mx.google.com with ESMTPSA id p2sm28166763qah.38.2014.04.13.17.24.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 13 Apr 2014 17:24:53 -0700 (PDT) Message-ID: <534B2AD5.9060309@gmail.com> Date: Sun, 13 Apr 2014 20:24:53 -0400 From: Josh Elser User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Thrift proxy: Python WholeRowIterator behavior References: <53498AC0.70508@gmail.com> <534AC076.8000206@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org David, Not quite sure what you're seeing. Using the "plain" python bindings, I think I emulated what you described. I created a table with the following data: 1 => ['col1: [] 1397241795 => val1', 'col2: [] 1397241797 => val2', 'col3: [] 1397241800 => val3'] 2 => ['col1: [] 1397241803 => val1', 'col2: [] 1397241806 => val2', 'col3: [] 1397241808 => val3'] I then modified the start and end Key (really just row) for the Range with the following code: https://github.com/joshelser/accumulo-python-thrift/blob/master/ReadWholeRow.py I got the results I would expect (just row1, just row2, and both row1 and row2). Perhaps you hit some sort of bug in pyaccumulo? Not sure -- HTH if you have more info. On 4/13/14, 5:02 PM, David O'Gwynn wrote: > Hi Russ, > > I ported it: > > def decode_row(cell): > value = StringIO.StringIO(cell.value) > numCells = struct.unpack('!i',value.read(4))[0] > key = cell.row > for i in range(numCells): > if value.pos == value.len: > raise Exception( > 'Reached the end of the parsable string without' > ' having finished unpacking. Likely an error' > ' of passing a cell that is not from a' > ' WholeRowIterator.' > ) > cf = value.read(struct.unpack('!i',value.read(4))[0]) > cq = value.read(struct.unpack('!i',value.read(4))[0]) > cv = value.read(struct.unpack('!i',value.read(4))[0]) > cts = struct.unpack('!q',value.read(8))[0]/1000. > val = value.read(struct.unpack('!i',value.read(4))[0]) > > You'll want the check at the beginning of the for loop; I found out > how fast Python can fill my available memory before I put that in. > > On Sun, Apr 13, 2014 at 4:43 PM, Russ Weeks wrote: >> Just curious, David, did you port the logic of WholeRowIterator.decodeRow >> over to Python, or is that functionality available somewhere in the >> pyaccumulo API and I just missed it? >> >> -Russ >> >> >> On Sun, Apr 13, 2014 at 10:48 AM, David O'Gwynn wrote: >>> >>> 1.5.0 >>> >>> Btw, the pyaccumulo library: >>> >>> https://github.com/accumulo/pyaccumulo >>> >>> is the basis of my codebase. You should be able to use that to >>> replicate the issue. >>> >>> Thanks for looking into this! >>> >>> On Sun, Apr 13, 2014 at 12:51 PM, Josh Elser wrote: >>>> Ah, gotcha. >>>> >>>> That definitely does not seem right. I'll see if I can poke around at >>>> this >>>> today. >>>> >>>> Are you using 1.5.0 or 1.5.1? (1.5.1 was just released a few weeks ago) >>>> >>>> >>>> On 4/12/14, 4:13 PM, David O'Gwynn wrote: >>>>> >>>>> Hi Josh, >>>>> >>>>> I guess I misspoke, the Range I'm passing is this: >>>>> >>>>> Range('row0', true, 'row0\0',true) >>>>> >>>>> Keeping in mind that the Thrift interface only exposes one Range >>>>> constructor (Range(Key,bool,Key,bool)), the actual call I'm passing is >>>>> this: >>>>> >>>>> Range( Key('row0',null,...), true, Key('row0\0',null,...), true ) >>>>> >>>>> If I scan for all entries (without WholeRowIterator), I get the full >>>>> contents of "row0". However, when I add the WholeRowIterator, it >>>>> returns nothing. >>>>> >>>>> Furthermore, if I were to pass the following: >>>>> >>>>> Range( Key('row0',null,...), true, Key('row1\0',null,...), true ) >>>>> >>>>> not only do I get both "row0" and "row1" without the WRI, I get "row1" >>>>> as a whole row with the WRI (but not "row0"). I.e. the WRI is somehow >>>>> interpreting my Range as having startKeyInclusive set to false, which >>>>> is clearly not the case. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> On Sat, Apr 12, 2014 at 2:49 PM, Josh Elser >>>>> wrote: >>>>>> >>>>>> Hi David, >>>>>> >>>>>> Looks like you're just mis-using the Range here. >>>>>> >>>>>> If you create a range that is ["row0", "row0"] as you denote below, >>>>>> that >>>>>> will only include Keys that have a rowId of "row0" with an empty >>>>>> colfam, >>>>>> colqual, etc. Since you want to use the WholeRowIterator, I can assume >>>>>> you >>>>>> want all columns in "row0". As such, ["row0", "row0\0") would be the >>>>>> best >>>>>> range to fetch all of the columns in that single row. >>>>>> >>>>>> >>>>>> On 4/12/2014 1:59 PM, David O'Gwynn wrote: >>>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'm working with the Python Thrift API for the Accumulo proxy >>>>>>> service, >>>>>>> and I have a bit of odd behavior happening. I'm using Accumulo 1.5 >>>>>>> (the standard one from the Accumulo website). >>>>>>> >>>>>>> Whenever I use the WholeRowIterator with a Scanner, I cannot >>>>>>> configure >>>>>>> the Range for that Scanner to correctly return the start row for the >>>>>>> Range. E.g. for the Range('row0',true,'row0',true) [to pull a singe >>>>>>> row], it returns zero entries. For Range('row0',true,'row1\0',true), >>>>>>> it returns only "row1". >>>>>>> >>>>>>> From the WholeRowIterator documentation, this behavior implies that >>>>>>> the startInclusive bit was set to False, which it clearly wasn't. >>>>>>> >>>>>>> I've been able to hack around this issue by setting the start key to >>>>>>> >>>>>>> Key(row=(row[:-1]+chr(ord(row[-1])-1))+'\0', inclusive=False) >>>>>>> >>>>>>> but I'd really rather understand the correct way of using a Range >>>>>>> object in conjunction with a WholeRowIterator. >>>>>>> >>>>>>> Thanks much, >>>>>>> >>>>>>> David >>>>>>> >>>>>> >>>> >> >>