hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <guy.doulb...@conduit.com>
Subject Re: inconsistent results when doing a select over a join
Date Tue, 10 Jan 2012 15:19:59 GMT
Hi guys,
I spent the day today investigating this issue, it seems like the 
differences occur when there are many killed tasks.

We are using the fair scheduler,  I ran the queries on large data and 
with low priority which caused the tasks of this job to be 
preempt(killed) many times.

After I began suspecting this issue, I gave the query the highest 
priority by doing that I reduced the number of killed tasks, that 
seemed to solve the problem


It is not that whenever there are killed task there are differences, it 
is when there many killed task because of preemption there are 
differences.

What do you say?
    
On Tue 10 Jan 2012 11:49:35 AM IST, Guy Doulberg wrote:
> Hi,
> Sorry for the late answer,
> I ran the query on small data, but couldn't reproduce,
> I can reproduce it at the moment on data that takes about 1.5  hour to
> process,
> I am trying to narrow the amount of data as much as I can, and still
> reproduce it...
>
> But I think it is clear to me, that the scale of data is the reason for
> the differences,
>
> What do you think?
>
>
>
> On Mon 09 Jan 2012 08:14:10 PM IST, Edward Capriolo wrote:
>> Create table, query , and some small data set to reproduce
>>
>> On Monday, January 9, 2012, Guy Doulberg<guy.doulberg@conduit.com
>> <mailto:guy.doulberg@conduit.com>>  wrote:
>>> Thanks, I am trying to reproduce it again,
>>>
>>> But what should I send the ML?
>>>
>>>
>>>
>>>
>>> On Mon 09 Jan 2012 07:54:24 PM IST, Edward Capriolo wrote:
>>>>
>>>> Can you reproduce the issue? possibly with the smaller tables and
>> send that to the ML?
>>>>
>>>> Edward
>>>>
>>>> On Mon, Jan 9, 2012 at 12:46 PM, Guy Doulberg
>> <guy.doulberg@conduit.com<mailto:guy.doulberg@conduit.com>
>> <mailto:guy.doulberg@conduit.com<mailto:guy.doulberg@conduit.com>>>
>> wrote:
>>>>
>>>>     Hey Dave,
>>>>     I didn't understand your question,
>>>>
>>>>     The Inconsistant is slightly different, about 2% of differences,
>>>>
>>>>     Thanks
>>>>
>>>>     Guy
>>>>
>>>>     On 01/09/2012 07:05 PM, David Houston wrote:
>>>>>
>>>>>     Hi Guy,
>>>>>
>>>>>     Inconsistant by way of the results are total off or the order is
>>>>>     different?
>>>>>
>>>>>     Thanks
>>>>>
>>>>>     Dave
>>>>>
>>>>>     On Jan 9, 2012 5:03 PM, "Guy Doulberg"
>> <guy.doulberg@conduit.com<mailto:guy.doulberg@conduit.com>
>>>>> <mailto:guy.doulberg@conduit.com
>> <mailto:guy.doulberg@conduit.com>>>  wrote:
>>>>>
>>>>>         Hi guys,
>>>>>
>>>>>         We are using hive for a while now, and recently we have
>>>>>         encountered an issue we just can't understand,
>>>>>
>>>>>         We are selecting(the select includes count(*)) over a join of
>>>>>         two big tables.
>>>>>
>>>>>         We ran the same query twice consequently over the same two
>>>>>         tables , and each time the result were slightly different.
>>>>>
>>>>>         We don't know how should we debug this issue, where should we
>>>>>         look, any ideas?
>>>>>
>>>>>         Thanks
>>>>>
>>>>>         Guy Doulberg,
>>>>>         Data infrastructure engineer,
>>>>>         Conduit
>>>>>
>>>>
>>>

Mime
View raw message