drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya <adityakish...@gmail.com>
Subject Re: Drill query does not return all results from HBase
Date Mon, 21 Mar 2016 17:25:43 GMT
Since I suspected that it was a bug in HBase, I tried it with the original
version you reported in the first post in this thread, i.e. CDH 5.4.3.

If it was back-ported to 5.4.7, upgrading should fix this issue.

On Mon, Mar 21, 2016 at 10:18 AM, Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com
> wrote:

> Aditya,
>
>
>
> Thank you for your help. What version of CDH are you running? I contacted
> Cloudera and they stated that bug HBASE-13262 is backported into CDH 5.4.7.
>
>
>
> Thanks,
>
>
>
> Kevin
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Sunday, March 20, 2016 10:45 PM
>
> *To:* Kumiko Yada <Kumiko.Yada@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Finally managed to reproduce it with CDH distribution (So far I was
> testing with HBase 1.1 distributed with MapR, which does not have this bug).
>
> This is essentially an HBase bug, HBASE-13262[1], which has been fixed in
> 1.0.1, 1.1.0.
>
> Please update your HBase distribution.
>
>
> [1] https://issues.apache.org/jira/browse/HBASE-13262
>
>
>
> On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com>
> wrote:
>
> Aditya,
>
>
>
> When we were exchanging the emails, you mentioned to me that you
> discovered another issue in case where the table is spit into multiple
> regions and the first region returned to the client did not have any rows.
> I think this issue is related to the issue that I’m seeing.  Have you
> opened the JIRA for this issue?  Have you investigated/fixed this issue?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Aditya [mailto:adityakishore@gmail.com]
> *Sent:* Thursday, March 17, 2016 3:02 PM
> *To:* Kumiko Yada <Kumiko.Yada@ds-iq.com>
> *Cc:* user@drill.apache.org; dev@drill.apache.org;
> altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com>; Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com>
>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> Hi Kumiko,
>
> I have tried to reproduce this locally with Apache 1.x release but have
> failed so far.
>
> From my mail exchange with Kevin on another thread, it appears that the
> HBase scanner stops returning rows after a while which seem odd.
>
> Probably it is unique to CDH distribution. I am planning to setup a single
> node CDH cluster to see if it I can reproduce it there.
>
>
>
> On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com>
> wrote:
>
> Hello,
>
> I provided all information that was requested; however, I haven't heard
> back anything since February 24.
>
> Is anyone taking look at this?  Are there any workarounds?
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Aditya [mailto:adityakishore@gmail.com]
> Sent: Friday, February 19, 2016 12:48 PM
> To: user <user@drill.apache.org>
>
> Cc: altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com>; Kevin
> Verhoeven <Kevin.Verhoeven@ds-iq.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Hi Kumiko,
>
> I apologies for not chiming in until now, considering that if there is a
> bug here it is most probably put in by me :)
>
> I've assigned the JIRA to myself and going to take a l look.
>
> Would it be possible for you to either attach to the JIRA or send me
> privately the Drill query profiles form both the correct and the incorrect
> executions?
>
> Regards,
> aditya...
>
> On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com>
> wrote:
>
> > Hello,
> >
> > Does anyone have any update on this issue,
> > https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> > that this would be investigated/fixed?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > Sent: Thursday, January 14, 2016 3:44 PM
> > To: user@drill.apache.org; altekrusejason@gmail.com
> > Subject: RE: Drill query does not return all results from HBase
> >
> > The query time was very short on the one with the incorrect result.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Thursday, January 14, 2016 1:25 PM
> > To: user <user@drill.apache.org>
> > Subject: Fwd: Drill query does not return all results from HBase
> >
> > Thanks for the update, I'm forwarding your message back to the list.
> >
> > Just to confirm, was the query time longer on the the one with the
> > incorrect result? In the incorrect case I think we are just misreading
> > the HBase metadata during our optimization to return row counts
> > without reading any data. This should be really fast, and noticeably
> > different than running a complete query, even with a small dataset as
> > we have to read in your table and run an aggregation over it.
> >
> > This would just be a final confirmation of where the issue is
> > occurring, I will hopefully have time soon to get this fixed but I'm
> > wrapping up some other things right now.
> >
> >
> > ---------- Forwarded message ----------
> > From: Kumiko Yada <Kumiko.Yada@ds-iq.com>
> > Date: Thu, Jan 14, 2016 at 12:53 PM
> > Subject: RE: Drill query does not return all results from HBase
> > To: Jason Altekruse <altekrusejason@gmail.com>
> >
> >
> > Jason,
> >
> >
> >
> > I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> > did differently; however your guess were correct.  When I did the one
> > column count, the row count was correct.  Here is the additional testing
> results.
> >
> >
> >
> > My company has been invested to use the drill, and it’s very important
> > for us that this is fixed.  Let me know if I can do anything to get
> > this issue to be fixed.  I really appreciate you that you are looking
> into issue!
> >
> > Hbase table (1 column family, 5 columns, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (1 column family, 6 columns,  10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (**returned 6724 rows)*
> >
> > 1 column count - row count is correct
> >
> > *Hbase table (2 column family, 6 columns in each columns family,
> > 10000000
> > rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 3362 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbase table (2 column family, 2 columns in each columns family,
> > 10000000
> > rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> > *Hbasetable (2 column family, 4 columns in one column family and 2
> > columns in other column family, 10000000 rows)*
> >
> > *COUNT(*) - row count is incorrect (returned 6723 rows)*
> >
> > 1 column count - row count is correct
> >
> > Hbasetable (2 column family, 1 column in one column family and 3
> > columns in other column family, 10000000 rows)
> >
> > COUNT(*) - row count is correct
> >
> > 1 column count - row count is correct
> >
> >
> >
> > Thanks
> >
> > Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:28 PM
> > *To:* 'Jason Altekruse' <altekrusejason@gmail.com>
> > *Cc:* Ki Kang <Ki.Kang@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > I also run the query to display only 1 column with no limit to try
> > force a full scan, but the result was the same, just 10000 rows
> > selected.  With the same table (contains 6 columns), I run the query
> > to display the row_key, and it display all records, 10,000,000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Kumiko Yada
> > *Sent:* Wednesday, January 13, 2016 7:24 PM
> > *To:* 'Jason Altekruse' <altekrusejason@gmail.com>
> > *Cc:* Ki Kang <Ki.Kang@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > *Subject:* RE: Drill query does not return all results from HBase
> >
> >
> >
> > Jason
> >
> >
> >
> > I run the query to display only 1 column for 100000 rows, and it only
> > returned 10000 rows.
> >
> >
> >
> > -Kumiko
> >
> >
> >
> > *From:* Jason Altekruse [mailto:altekrusejason@gmail.com <
> > altekrusejason@gmail.com>]
> > *Sent:* Wednesday, January 13, 2016 6:39 PM
> > *To:* Kumiko Yada <Kumiko.Yada@ds-iq.com>
> > *Cc:* Ki Kang <Ki.Kang@ds-iq.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> >
> > *Subject:* Re: Drill query does not return all results from HBase
> >
> >
> >
> > I know in a number of cases we have special optimizer rules that try
> > to skip reading the dataset all together if we have metadata for the
> > number of rows and all that is requested is a count(*). I assume that
> > this is the case with HBase, and this may be where we aren't doing
> something correctly.
> > Can you try to run a 'sum', or other aggregate query on one of the
> > columns to see if a full scan of the data is operating correctly?
> >
> >
> >
> > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com>
> > wrote:
> >
> > Thank you, Jason!
> >
> > Let me know if you need any help on this. I will be glad to help on
> > repro and/or test the fix.
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Jason Altekruse [mailto:altekrusejason@gmail.com]
> > Sent: Wednesday, January 13, 2016 6:24 PM
> > To: user <user@drill.apache.org>
> >
> > Cc: Aditya Kishore <adityakishore@gmail.com>; Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Thanks for filing the issue. I haven't worked much with HBase, but
> > this is a critical wrong results issues, so I will be taking a look at
> > this soon if no one else raises their hand.
> >
> > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com>
> > wrote:
> >
> > > I opened the bug on this.  The drill is returning the correct rows
> > > when the hbase contains 5 or less columns, but not 6 or more columns.
> > >
> > > https://issues.apache.org/jira/browse/DRILL-4271
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> > > Sent: Wednesday, January 13, 2016 4:52 PM
> > > To: user@drill.apache.org
> > > Cc: Aditya Kishore <adityakishore@gmail.com>; Kevin Verhoeven <
> > > Kevin.Verhoeven@ds-iq.com>
> > > Subject: RE: Drill query does not return all results from HBase
> > >
> > > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > > row count returned when the Hbase table contains only 1 column
> > > family, 1 column, but the incorrect row count is returned for the
> > > Hbase table contains 1 column family, 6 columns.
> > >
> > > This looks like the Drill issue.  Has anyone found any workaround?
> > >
> > > Thanks
> > > Kumiko
> > >
> > > -----Original Message-----
> > > From: Abhishek Girish [mailto:abhishek.girish@gmail.com]
> > > Sent: Tuesday, January 12, 2016 6:51 PM
> > > To: user <user@drill.apache.org>
> > > Cc: Aditya Kishore <adityakishore@gmail.com>
> > > Subject: Re: Drill query does not return all results from HBase
> > >
> > > Well, the major version din't change if I remember it right, hence
> > > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > > now and don't see the issue. Also, I am on a MapR setup, which might
> > > not be comparable with their CDH setups.
> > >
> > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > > <altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > Abhishek,
> > > >
> > > > What version of HBase did you have the problem with, and what
> > > > version did you upgrade to that solved the problem? I assume this
> > > > would be useful information to compare your setup with Kevin's and
> > Kumiko's.
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > > abhishek.girish@gmail.com
> > > > > wrote:
> > > >
> > > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > > to fetch all records, whereas I was only able to see a small
> > > > > subset of records
> > > > when
> > > > > queried from Drill. Each time I inserted 1000 records, only
> > > > > about
> > > > > 50 of those would show up.
> > > > >
> > > > > Although I could repro' the problem consistently, it was
> > > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > > might not have to do with
> > > > Drill
> > > > > itself.
> > > > >
> > > > > -Abhishek
> > > > >
> > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I'm not sure why this is happening, we have tests in our
> > > > > > automated
> > > > suite
> > > > > > that I believe run some pretty large queries against Hbase and
> > > > > > verify
> > > > the
> > > > > > results.
> > > > > >
> > > > > > Aditya, do you have some time available to try to reproduce
> > > > > > this and diagnose the problem?
> > > > > >
> > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > > <Kumiko.Yada@ds-iq.com>
> > > > > wrote:
> > > > > >
> > > > > > > I'm having the same issue.  Is there any workaround for
this?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Kumiko
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com]
> > > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: Drill query does not return all results from HBase
> > > > > > >
> > > > > > > We have a problem where a Drill query against HBase does
not
> > > > > > > return
> > > > all
> > > > > > > results. The following query should return over 100,000
> > > > > > > rows, but we
> > > > > only
> > > > > > > get about 1,030 back.
> > > > > > >
> > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > > customer_number =
> > > > > 800
> > > > > > >
> > > > > > > If we scan directly using the hbase shell we see over
> > > > > > > 100,000 rows,
> > > > but
> > > > > > > the same Drill query does not return a fraction of the
> > > > > > > expected
> > > > > results.
> > > > > > We
> > > > > > > have also run a count against the table and Drill returns
> > > > > > > the same
> > > > > 1,030
> > > > > > > number, which is far less than expect. What could be going
> wrong?
> > > > > > >
> > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > > (HBase
> > > > 1.0).
> > > > > > We
> > > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > > billion
> > > rows.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Kevin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message