Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 248EA190E4 for ; Thu, 17 Mar 2016 22:02:20 +0000 (UTC) Received: (qmail 60545 invoked by uid 500); 17 Mar 2016 22:02:19 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 60466 invoked by uid 500); 17 Mar 2016 22:02:19 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 60444 invoked by uid 99); 17 Mar 2016 22:02:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 22:02:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 79E52C0316; Thu, 17 Mar 2016 22:02:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id zd0_WeC5tWTr; Thu, 17 Mar 2016 22:02:15 +0000 (UTC) Received: from mail-qg0-f52.google.com (mail-qg0-f52.google.com [209.85.192.52]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 518D05F19B; Thu, 17 Mar 2016 22:02:14 +0000 (UTC) Received: by mail-qg0-f52.google.com with SMTP id a36so53125029qge.0; Thu, 17 Mar 2016 15:02:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc; bh=BY760J4CCDjQClQILIFyEi139pkPQN52/QXGIFEL9VY=; b=oTulJncaZW0o/IeNooWHdWJL0Fn133Lqg05ghFdteoGzaMOurssPn3XkjaxDLynk0A 1pl9O45XsOXEoM9zWBodhOlSAoLPTT6L8uj9rWudEHRojV9MEtHDTDTEzTCadSQODhSF cAMtEuig0qT6nQuV5kvtJxOLzSc2ysu6LCYmtj935OXrclXjU0k3A3kKgAy6VXlWRNnw dpBh5tmaie2cA9wOejMIbR8JDi5eo8FC2oaQ9J/Mfutx5/e5d3E8XY2fHV7ahx6yffQa uGp7+jqQ7qMYts9ZV3Khv0uBXXMkOmuQuSN+GvttEgbkmfMFnZcZVZ9RQUhJ54pK2+Qz 0aow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=BY760J4CCDjQClQILIFyEi139pkPQN52/QXGIFEL9VY=; b=MGoR5O0MmnyS5oPxf5NXjkFl9t0cYU/Ou9mHpHaQRy7Xw54PHpnJeZE8w4Bi9Jgu9W 0HACGcZRUnRauSyDYWIQVchvKnBGvHijfCjPZAaF+ON/lVZcBa2nybFJZt78uA0rgyVg dh0R8tYaQ7Pq2HbKrdKyzu0OT5WOo/1BPtM8SnA8JxEFrP4WTfNSlQu6CjBLUimw/lsa 2ThuRQev/FGyvHtmQIhSYkSrHn4x9nrjzIkKahXzD2ET6lbe2FsDOATSK2KYLRZwGKlq sbfHl029Qcf1nvhpFROLPE7R8bWd8YnZF9shIMkmkltvdAe5jrPgwc6hPMUGtT1K7Xfs 44sg== X-Gm-Message-State: AD7BkJK3KqQ0428fazko21vaiItd25xkGBz0k61O5XG4IQZZFDI1nSeF0K+O4XsgPlzvEBdkWPjYimhehDQgEw== X-Received: by 10.141.28.202 with SMTP id f193mr11016157qhe.53.1458252126748; Thu, 17 Mar 2016 15:02:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.233.222.65 with HTTP; Thu, 17 Mar 2016 15:01:47 -0700 (PDT) Reply-To: adityakishore@gmail.com In-Reply-To: References: From: Aditya Date: Thu, 17 Mar 2016 15:01:47 -0700 Message-ID: Subject: Re: Drill query does not return all results from HBase To: Kumiko Yada Cc: "user@drill.apache.org" , "dev@drill.apache.org" , "altekrusejason@gmail.com" , Ki Kang , Kevin Verhoeven Content-Type: multipart/alternative; boundary=001a114233760208a7052e45c875 --001a114233760208a7052e45c875 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Kumiko, I have tried to reproduce this locally with Apache 1.x release but have failed so far. >From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops returning rows after a while which seem odd. Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster to see if it I can reproduce it there. On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada wrote: > Hello, > > I provided all information that was requested; however, I haven't heard > back anything since February 24. > > Is anyone taking look at this? Are there any workarounds? > > https://issues.apache.org/jira/browse/DRILL-4271 > > Thanks > Kumiko > > -----Original Message----- > From: Aditya [mailto:adityakishore@gmail.com] > Sent: Friday, February 19, 2016 12:48 PM > To: user > Cc: altekrusejason@gmail.com; Ki Kang ; Kevin > Verhoeven > Subject: Re: Drill query does not return all results from HBase > > Hi Kumiko, > > I apologies for not chiming in until now, considering that if there is a > bug here it is most probably put in by me :) > > I've assigned the JIRA to myself and going to take a l look. > > Would it be possible for you to either attach to the JIRA or send me > privately the Drill query profiles form both the correct and the incorrec= t > executions? > > Regards, > aditya... > > On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada > wrote: > > > Hello, > > > > Does anyone have any update on this issue, > > https://issues.apache.org/jira/browse/DRILL-4271? Are there any plan > > that this would be investigated/fixed? > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com] > > Sent: Thursday, January 14, 2016 3:44 PM > > To: user@drill.apache.org; altekrusejason@gmail.com > > Subject: RE: Drill query does not return all results from HBase > > > > The query time was very short on the one with the incorrect result. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse [mailto:altekrusejason@gmail.com] > > Sent: Thursday, January 14, 2016 1:25 PM > > To: user > > Subject: Fwd: Drill query does not return all results from HBase > > > > Thanks for the update, I'm forwarding your message back to the list. > > > > Just to confirm, was the query time longer on the the one with the > > incorrect result? In the incorrect case I think we are just misreading > > the HBase metadata during our optimization to return row counts > > without reading any data. This should be really fast, and noticeably > > different than running a complete query, even with a small dataset as > > we have to read in your table and run an aggregation over it. > > > > This would just be a final confirmation of where the issue is > > occurring, I will hopefully have time soon to get this fixed but I'm > > wrapping up some other things right now. > > > > > > ---------- Forwarded message ---------- > > From: Kumiko Yada > > Date: Thu, Jan 14, 2016 at 12:53 PM > > Subject: RE: Drill query does not return all results from HBase > > To: Jason Altekruse > > > > > > Jason, > > > > > > > > I=E2=80=99m sorry. My testing was incorrect last night. I=E2=80=99m n= ot sure what I > > did differently; however your guess were correct. When I did the one > > column count, the row count was correct. Here is the additional testin= g > results. > > > > > > > > My company has been invested to use the drill, and it=E2=80=99s very im= portant > > for us that this is fixed. Let me know if I can do anything to get > > this issue to be fixed. I really appreciate you that you are looking > into issue! > > > > Hbase table (1 column family, 5 columns, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbase table (1 column family, 6 columns, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (**returned 6724 rows)* > > > > 1 column count - row count is correct > > > > *Hbase table (2 column family, 6 columns in each columns family, > > 10000000 > > rows)* > > > > *COUNT(*) - row count is incorrect (returned 3362 rows)* > > > > 1 column count - row count is correct > > > > Hbase table (2 column family, 2 columns in each columns family, > > 10000000 > > rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbasetable (2 column family, 4 columns in one column family and 2 > > columns in other column family, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (returned 6723 rows)* > > > > 1 column count - row count is correct > > > > Hbasetable (2 column family, 1 column in one column family and 3 > > columns in other column family, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > > > > > Thanks > > > > Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:28 PM > > *To:* 'Jason Altekruse' > > *Cc:* Ki Kang ; Kevin Verhoeven < > > Kevin.Verhoeven@ds-iq.com> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > I also run the query to display only 1 column with no limit to try > > force a full scan, but the result was the same, just 10000 rows > > selected. With the same table (contains 6 columns), I run the query > > to display the row_key, and it display all records, 10,000,000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:24 PM > > *To:* 'Jason Altekruse' > > *Cc:* Ki Kang ; Kevin Verhoeven < > > Kevin.Verhoeven@ds-iq.com> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > Jason > > > > > > > > I run the query to display only 1 column for 100000 rows, and it only > > returned 10000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com < > > altekrusejason@gmail.com>] > > *Sent:* Wednesday, January 13, 2016 6:39 PM > > *To:* Kumiko Yada > > *Cc:* Ki Kang ; Kevin Verhoeven < > > Kevin.Verhoeven@ds-iq.com> > > > > *Subject:* Re: Drill query does not return all results from HBase > > > > > > > > I know in a number of cases we have special optimizer rules that try > > to skip reading the dataset all together if we have metadata for the > > number of rows and all that is requested is a count(*). I assume that > > this is the case with HBase, and this may be where we aren't doing > something correctly. > > Can you try to run a 'sum', or other aggregate query on one of the > > columns to see if a full scan of the data is operating correctly? > > > > > > > > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada > > wrote: > > > > Thank you, Jason! > > > > Let me know if you need any help on this. I will be glad to help on > > repro and/or test the fix. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse [mailto:altekrusejason@gmail.com] > > Sent: Wednesday, January 13, 2016 6:24 PM > > To: user > > > > Cc: Aditya Kishore ; Kevin Verhoeven < > > Kevin.Verhoeven@ds-iq.com> > > Subject: Re: Drill query does not return all results from HBase > > > > Thanks for filing the issue. I haven't worked much with HBase, but > > this is a critical wrong results issues, so I will be taking a look at > > this soon if no one else raises their hand. > > > > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada > > wrote: > > > > > I opened the bug on this. The drill is returning the correct rows > > > when the hbase contains 5 or less columns, but not 6 or more columns. > > > > > > https://issues.apache.org/jira/browse/DRILL-4271 > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com] > > > Sent: Wednesday, January 13, 2016 4:52 PM > > > To: user@drill.apache.org > > > Cc: Aditya Kishore ; Kevin Verhoeven < > > > Kevin.Verhoeven@ds-iq.com> > > > Subject: RE: Drill query does not return all results from HBase > > > > > > We are using the HBase 1.0.0. & CDH 5.4. I found out the correct > > > row count returned when the Hbase table contains only 1 column > > > family, 1 column, but the incorrect row count is returned for the > > > Hbase table contains 1 column family, 6 columns. > > > > > > This looks like the Drill issue. Has anyone found any workaround? > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com] > > > Sent: Tuesday, January 12, 2016 6:51 PM > > > To: user > > > Cc: Aditya Kishore > > > Subject: Re: Drill query does not return all results from HBase > > > > > > Well, the major version din't change if I remember it right, hence > > > did not share the info in my previous mail. I'm on HBase 1.1.1 right > > > now and don't see the issue. Also, I am on a MapR setup, which might > > > not be comparable with their CDH setups. > > > > > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse > > > > > > > > > wrote: > > > > > > > Abhishek, > > > > > > > > What version of HBase did you have the problem with, and what > > > > version did you upgrade to that solved the problem? I assume this > > > > would be useful information to compare your setup with Kevin's and > > Kumiko's. > > > > > > > > - Jason > > > > > > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < > > > > abhishek.girish@gmail.com > > > > > wrote: > > > > > > > > > I hit a very similar issue recently. Via HBase shell, i was able > > > > > to fetch all records, whereas I was only able to see a small > > > > > subset of records > > > > when > > > > > queried from Drill. Each time I inserted 1000 records, only > > > > > about > > > > > 50 of those would show up. > > > > > > > > > > Although I could repro' the problem consistently, it was > > > > > resolved once i updated my Hadoop setup. My guess is that it was > > > > > a HBase bug which got resolved. Although strange as it seems, it > > > > > might not have to do with > > > > Drill > > > > > itself. > > > > > > > > > > -Abhishek > > > > > > > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse < > > > > altekrusejason@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > I'm not sure why this is happening, we have tests in our > > > > > > automated > > > > suite > > > > > > that I believe run some pretty large queries against Hbase and > > > > > > verify > > > > the > > > > > > results. > > > > > > > > > > > > Aditya, do you have some time available to try to reproduce > > > > > > this and diagnose the problem? > > > > > > > > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada > > > > > > > > > > > wrote: > > > > > > > > > > > > > I'm having the same issue. Is there any workaround for this? > > > > > > > > > > > > > > Thanks > > > > > > > Kumiko > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com] > > > > > > > Sent: Monday, December 21, 2015 10:37 AM > > > > > > > To: user@drill.apache.org > > > > > > > Subject: Drill query does not return all results from HBase > > > > > > > > > > > > > > We have a problem where a Drill query against HBase does not > > > > > > > return > > > > all > > > > > > > results. The following query should return over 100,000 > > > > > > > rows, but we > > > > > only > > > > > > > get about 1,030 back. > > > > > > > > > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE > > > > > > > customer_number =3D > > > > > 800 > > > > > > > > > > > > > > If we scan directly using the hbase shell we see over > > > > > > > 100,000 rows, > > > > but > > > > > > > the same Drill query does not return a fraction of the > > > > > > > expected > > > > > results. > > > > > > We > > > > > > > have also run a count against the table and Drill returns > > > > > > > the same > > > > > 1,030 > > > > > > > number, which is far less than expect. What could be going > wrong? > > > > > > > > > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 > > > > > > > (HBase > > > > 1.0). > > > > > > We > > > > > > > run HBase on six RegionServers, the table has about 1.3 > > > > > > > billion > > > rows. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Kevin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a114233760208a7052e45c875--