drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Verhoeven <Kevin.Verhoe...@ds-iq.com>
Subject RE: Drill query does not return all results from HBase
Date Mon, 21 Mar 2016 17:18:20 GMT
Aditya,

Thank you for your help. What version of CDH are you running? I contacted Cloudera and they
stated that bug HBASE-13262 is backported into CDH 5.4.7.

Thanks,

Kevin

From: Aditya [mailto:adityakishore@gmail.com]
Sent: Sunday, March 20, 2016 10:45 PM
To: Kumiko Yada <Kumiko.Yada@ds-iq.com>
Cc: user@drill.apache.org; dev@drill.apache.org; altekrusejason@gmail.com; Ki Kang <Ki.Kang@ds-iq.com>;
Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1
distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0.
Please update your HBase distribution.

[1] https://issues.apache.org/jira/browse/HBASE-13262

On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
wrote:
Aditya,

When we were exchanging the emails, you mentioned to me that you discovered another issue
in case where the table is spit into multiple regions and the first region returned to the
client did not have any rows.  I think this issue is related to the issue that I’m seeing.
 Have you opened the JIRA for this issue?  Have you investigated/fixed this issue?

Thanks
Kumiko

From: Aditya [mailto:adityakishore@gmail.com<mailto:adityakishore@gmail.com>]
Sent: Thursday, March 17, 2016 3:02 PM
To: Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
Cc: user@drill.apache.org<mailto:user@drill.apache.org>; dev@drill.apache.org<mailto:dev@drill.apache.org>;
altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>; Ki Kang <Ki.Kang@ds-iq.com<mailto:Ki.Kang@ds-iq.com>>;
Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>

Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I have tried to reproduce this locally with Apache 1.x release but have failed so far.
From my mail exchange with Kevin on another thread, it appears that the HBase scanner stops
returning rows after a while which seem odd.
Probably it is unique to CDH distribution. I am planning to setup a single node CDH cluster
to see if it I can reproduce it there.

On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
wrote:
Hello,

I provided all information that was requested; however, I haven't heard back anything since
February 24.

Is anyone taking look at this?  Are there any workarounds?

https://issues.apache.org/jira/browse/DRILL-4271

Thanks
Kumiko

-----Original Message-----
From: Aditya [mailto:adityakishore@gmail.com<mailto:adityakishore@gmail.com>]
Sent: Friday, February 19, 2016 12:48 PM
To: user <user@drill.apache.org<mailto:user@drill.apache.org>>
Cc: altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>; Ki Kang <Ki.Kang@ds-iq.com<mailto:Ki.Kang@ds-iq.com>>;
Kevin Verhoeven <Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
Subject: Re: Drill query does not return all results from HBase

Hi Kumiko,

I apologies for not chiming in until now, considering that if there is a bug here it is most
probably put in by me :)

I've assigned the JIRA to myself and going to take a l look.

Would it be possible for you to either attach to the JIRA or send me privately the Drill query
profiles form both the correct and the incorrect executions?

Regards,
aditya...

On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
wrote:

> Hello,
>
> Does anyone have any update on this issue,
> https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan
> that this would be investigated/fixed?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>]
> Sent: Thursday, January 14, 2016 3:44 PM
> To: user@drill.apache.org<mailto:user@drill.apache.org>; altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>
> Subject: RE: Drill query does not return all results from HBase
>
> The query time was very short on the one with the incorrect result.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>]
> Sent: Thursday, January 14, 2016 1:25 PM
> To: user <user@drill.apache.org<mailto:user@drill.apache.org>>
> Subject: Fwd: Drill query does not return all results from HBase
>
> Thanks for the update, I'm forwarding your message back to the list.
>
> Just to confirm, was the query time longer on the the one with the
> incorrect result? In the incorrect case I think we are just misreading
> the HBase metadata during our optimization to return row counts
> without reading any data. This should be really fast, and noticeably
> different than running a complete query, even with a small dataset as
> we have to read in your table and run an aggregation over it.
>
> This would just be a final confirmation of where the issue is
> occurring, I will hopefully have time soon to get this fixed but I'm
> wrapping up some other things right now.
>
>
> ---------- Forwarded message ----------
> From: Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
> Date: Thu, Jan 14, 2016 at 12:53 PM
> Subject: RE: Drill query does not return all results from HBase
> To: Jason Altekruse <altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>>
>
>
> Jason,
>
>
>
> I’m sorry.  My testing was incorrect last night.  I’m not sure what I
> did differently; however your guess were correct.  When I did the one
> column count, the row count was correct.  Here is the additional testing results.
>
>
>
> My company has been invested to use the drill, and it’s very important
> for us that this is fixed.  Let me know if I can do anything to get
> this issue to be fixed.  I really appreciate you that you are looking into issue!
>
> Hbase table (1 column family, 5 columns, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbase table (1 column family, 6 columns,  10000000 rows)*
>
> *COUNT(*) - row count is incorrect (**returned 6724 rows)*
>
> 1 column count - row count is correct
>
> *Hbase table (2 column family, 6 columns in each columns family,
> 10000000
> rows)*
>
> *COUNT(*) - row count is incorrect (returned 3362 rows)*
>
> 1 column count - row count is correct
>
> Hbase table (2 column family, 2 columns in each columns family,
> 10000000
> rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
> *Hbasetable (2 column family, 4 columns in one column family and 2
> columns in other column family, 10000000 rows)*
>
> *COUNT(*) - row count is incorrect (returned 6723 rows)*
>
> 1 column count - row count is correct
>
> Hbasetable (2 column family, 1 column in one column family and 3
> columns in other column family, 10000000 rows)
>
> COUNT(*) - row count is correct
>
> 1 column count - row count is correct
>
>
>
> Thanks
>
> Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:28 PM
> *To:* 'Jason Altekruse' <altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>>
> *Cc:* Ki Kang <Ki.Kang@ds-iq.com<mailto:Ki.Kang@ds-iq.com>>; Kevin Verhoeven
<
> Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> I also run the query to display only 1 column with no limit to try
> force a full scan, but the result was the same, just 10000 rows
> selected.  With the same table (contains 6 columns), I run the query
> to display the row_key, and it display all records, 10,000,000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Kumiko Yada
> *Sent:* Wednesday, January 13, 2016 7:24 PM
> *To:* 'Jason Altekruse' <altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>>
> *Cc:* Ki Kang <Ki.Kang@ds-iq.com<mailto:Ki.Kang@ds-iq.com>>; Kevin Verhoeven
<
> Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
> *Subject:* RE: Drill query does not return all results from HBase
>
>
>
> Jason
>
>
>
> I run the query to display only 1 column for 100000 rows, and it only
> returned 10000 rows.
>
>
>
> -Kumiko
>
>
>
> *From:* Jason Altekruse [mailto:altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>
<
> altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>>]
> *Sent:* Wednesday, January 13, 2016 6:39 PM
> *To:* Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
> *Cc:* Ki Kang <Ki.Kang@ds-iq.com<mailto:Ki.Kang@ds-iq.com>>; Kevin Verhoeven
<
> Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
>
> *Subject:* Re: Drill query does not return all results from HBase
>
>
>
> I know in a number of cases we have special optimizer rules that try
> to skip reading the dataset all together if we have metadata for the
> number of rows and all that is requested is a count(*). I assume that
> this is the case with HBase, and this may be where we aren't doing something correctly.
> Can you try to run a 'sum', or other aggregate query on one of the
> columns to see if a full scan of the data is operating correctly?
>
>
>
> On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
> wrote:
>
> Thank you, Jason!
>
> Let me know if you need any help on this. I will be glad to help on
> repro and/or test the fix.
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Jason Altekruse [mailto:altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>]
> Sent: Wednesday, January 13, 2016 6:24 PM
> To: user <user@drill.apache.org<mailto:user@drill.apache.org>>
>
> Cc: Aditya Kishore <adityakishore@gmail.com<mailto:adityakishore@gmail.com>>;
Kevin Verhoeven <
> Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
> Subject: Re: Drill query does not return all results from HBase
>
> Thanks for filing the issue. I haven't worked much with HBase, but
> this is a critical wrong results issues, so I will be taking a look at
> this soon if no one else raises their hand.
>
> On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
> wrote:
>
> > I opened the bug on this.  The drill is returning the correct rows
> > when the hbase contains 5 or less columns, but not 6 or more columns.
> >
> > https://issues.apache.org/jira/browse/DRILL-4271
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>]
> > Sent: Wednesday, January 13, 2016 4:52 PM
> > To: user@drill.apache.org<mailto:user@drill.apache.org>
> > Cc: Aditya Kishore <adityakishore@gmail.com<mailto:adityakishore@gmail.com>>;
Kevin Verhoeven <
> > Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>>
> > Subject: RE: Drill query does not return all results from HBase
> >
> > We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct
> > row count returned when the Hbase table contains only 1 column
> > family, 1 column, but the incorrect row count is returned for the
> > Hbase table contains 1 column family, 6 columns.
> >
> > This looks like the Drill issue.  Has anyone found any workaround?
> >
> > Thanks
> > Kumiko
> >
> > -----Original Message-----
> > From: Abhishek Girish [mailto:abhishek.girish@gmail.com<mailto:abhishek.girish@gmail.com>]
> > Sent: Tuesday, January 12, 2016 6:51 PM
> > To: user <user@drill.apache.org<mailto:user@drill.apache.org>>
> > Cc: Aditya Kishore <adityakishore@gmail.com<mailto:adityakishore@gmail.com>>
> > Subject: Re: Drill query does not return all results from HBase
> >
> > Well, the major version din't change if I remember it right, hence
> > did not share the info in my previous mail. I'm on HBase 1.1.1 right
> > now and don't see the issue. Also, I am on a MapR setup, which might
> > not be comparable with their CDH setups.
> >
> > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse
> > <altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>
> > >
> > wrote:
> >
> > > Abhishek,
> > >
> > > What version of HBase did you have the problem with, and what
> > > version did you upgrade to that solved the problem? I assume this
> > > would be useful information to compare your setup with Kevin's and
> Kumiko's.
> > >
> > > - Jason
> > >
> > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com<mailto:abhishek.girish@gmail.com>
> > > > wrote:
> > >
> > > > I hit a very similar issue recently. Via HBase shell, i was able
> > > > to fetch all records, whereas I was only able to see a small
> > > > subset of records
> > > when
> > > > queried from Drill. Each time I inserted 1000 records, only
> > > > about
> > > > 50 of those would show up.
> > > >
> > > > Although I could repro' the problem consistently, it was
> > > > resolved once i updated my Hadoop setup. My guess is that it was
> > > > a HBase bug which got resolved. Although strange as it seems, it
> > > > might not have to do with
> > > Drill
> > > > itself.
> > > >
> > > > -Abhishek
> > > >
> > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > > altekrusejason@gmail.com<mailto:altekrusejason@gmail.com>
> > > > >
> > > > wrote:
> > > >
> > > > > I'm not sure why this is happening, we have tests in our
> > > > > automated
> > > suite
> > > > > that I believe run some pretty large queries against Hbase and
> > > > > verify
> > > the
> > > > > results.
> > > > >
> > > > > Aditya, do you have some time available to try to reproduce
> > > > > this and diagnose the problem?
> > > > >
> > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada
> > > > > <Kumiko.Yada@ds-iq.com<mailto:Kumiko.Yada@ds-iq.com>>
> > > > wrote:
> > > > >
> > > > > > I'm having the same issue.  Is there any workaround for this?
> > > > > >
> > > > > > Thanks
> > > > > > Kumiko
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Kevin Verhoeven [mailto:Kevin.Verhoeven@ds-iq.com<mailto:Kevin.Verhoeven@ds-iq.com>]
> > > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > > To: user@drill.apache.org<mailto:user@drill.apache.org>
> > > > > > Subject: Drill query does not return all results from HBase
> > > > > >
> > > > > > We have a problem where a Drill query against HBase does not
> > > > > > return
> > > all
> > > > > > results. The following query should return over 100,000
> > > > > > rows, but we
> > > > only
> > > > > > get about 1,030 back.
> > > > > >
> > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE
> > > > > > customer_number =
> > > > 800
> > > > > >
> > > > > > If we scan directly using the hbase shell we see over
> > > > > > 100,000 rows,
> > > but
> > > > > > the same Drill query does not return a fraction of the
> > > > > > expected
> > > > results.
> > > > > We
> > > > > > have also run a count against the table and Drill returns
> > > > > > the same
> > > > 1,030
> > > > > > number, which is far less than expect. What could be going wrong?
> > > > > >
> > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3
> > > > > > (HBase
> > > 1.0).
> > > > > We
> > > > > > run HBase on six RegionServers, the table has about 1.3
> > > > > > billion
> > rows.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kevin
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message