hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <guy.doulb...@conduit.com>
Subject Re: inconsistent results when doing a select over a join
Date Tue, 10 Jan 2012 09:49:35 GMT
Hi,
Sorry for the late answer,
I ran the query on small data, but couldn't reproduce,
I can reproduce it at the moment on data that takes about 1.5  hour to 
process,
I am trying to narrow the amount of data as much as I can, and still 
reproduce it...

But I think it is clear to me, that the scale of data is the reason for 
the differences,

What do you think?



On Mon 09 Jan 2012 08:14:10 PM IST, Edward Capriolo wrote:
> Create table, query , and some small data set to reproduce
>
> On Monday, January 9, 2012, Guy Doulberg <guy.doulberg@conduit.com 
> <mailto:guy.doulberg@conduit.com>> wrote:
> > Thanks, I am trying to reproduce it again,
> >
> > But what should I send the ML?
> >
> >
> >
> >
> > On Mon 09 Jan 2012 07:54:24 PM IST, Edward Capriolo wrote:
> >>
> >> Can you reproduce the issue? possibly with the smaller tables and 
> send that to the ML?
> >>
> >> Edward
> >>
> >> On Mon, Jan 9, 2012 at 12:46 PM, Guy Doulberg 
> <guy.doulberg@conduit.com <mailto:guy.doulberg@conduit.com> 
> <mailto:guy.doulberg@conduit.com <mailto:guy.doulberg@conduit.com>>> 
> wrote:
> >>
> >>    Hey Dave,
> >>    I didn't understand your question,
> >>
> >>    The Inconsistant is slightly different, about 2% of differences,
> >>
> >>    Thanks
> >>
> >>    Guy
> >>
> >>    On 01/09/2012 07:05 PM, David Houston wrote:
> >>>
> >>>    Hi Guy,
> >>>
> >>>    Inconsistant by way of the results are total off or the order is
> >>>    different?
> >>>
> >>>    Thanks
> >>>
> >>>    Dave
> >>>
> >>>    On Jan 9, 2012 5:03 PM, "Guy Doulberg" 
> <guy.doulberg@conduit.com <mailto:guy.doulberg@conduit.com>
> >>> <mailto:guy.doulberg@conduit.com 
> <mailto:guy.doulberg@conduit.com>>> wrote:
> >>>
> >>>        Hi guys,
> >>>
> >>>        We are using hive for a while now, and recently we have
> >>>        encountered an issue we just can't understand,
> >>>
> >>>        We are selecting(the select includes count(*)) over a join of
> >>>        two big tables.
> >>>
> >>>        We ran the same query twice consequently over the same two
> >>>        tables , and each time the result were slightly different.
> >>>
> >>>        We don't know how should we debug this issue, where should we
> >>>        look, any ideas?
> >>>
> >>>        Thanks
> >>>
> >>>        Guy Doulberg,
> >>>        Data infrastructure engineer,
> >>>        Conduit
> >>>
> >>
> >

Mime
View raw message