hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Spence <>
Subject Re: Inconsistent results from INSERT OVERWRITE TABLE
Date Thu, 12 May 2011 00:05:59 GMT
Thank you.  Are there tools for parsing the Hive logs for errors?  If not,
can you talk about the strategy used at Facebook to deal with detection and
resolution of MR errors?

Perhaps I can write a script to identify errors.  First I have to solve the
mystery of why there are no logs on my hadoop master.

I'm trying now to import each day's server logs one at a time (instead of
importing all logs in one Hive command) to see if that solves my issue with
inconsistent results after mass loading of server logs.  I'll post an update
if I find anything useful.

On Wed, May 11, 2011 at 2:33 PM, Ning Zhang <> wrote:

> Hive queries are compiled to different types tasks (MapReduce, MoveTask,
> etc), so a successful MR task as indicated in the JT doesn't mean the whole
> query succeeded. So you need to examine the status of the hive query to see
> if it succeeded or not. You can also check the hive's log file under
> /tmp/<user>/hive.log to debug if a query failed.
> Also the reason of a broken pipe errors are mostly due to the fact that the
> script crashed during the mapreduce job. In this case the MR job should
> fail, as well as the whole Hive query.
> On May 11, 2011, at 2:16 PM, Tim Spence wrote:
> > I've been using Hive in production for two months now.  We're mainly
> using it for processing server logs, about 1-2GB per day (2-2.5 million
> requests).  Typically we import a day's worth of logs at once.  That said,
> sometimes we decide to tweak a calculated column.  When that happens, we
> modify our transformation script and re-import the entire set of logs (~200
> days) into ~600 partitions.
> >
> > A few days ago I noticed that simple queries, such as a count of page
> views over a given week, were giving results up to 10% higher than they
> yielded just a week before.  I suspected that we may have "found"
> unprocessed log files, so I set up a script to re-import the entire
> inventory of logs and re-run the queries.  I got identical results for some
> weeks, but different results for some errors.  I repeated this experiment
> and got different results.
> >
> > In the course of this I found that sometimes Hive will create all of the
> partitions but write no data to them while not reporting any errors in the
> job tracker.  Other times it will fail and leave a stack trace blaming a
> broken pipe.
> >
> > Does anyone have any ideas what I may be doing wrong?  I can change our
> practices whichever way; all I want is confidence that all of my data has
> been properly imported.
> > Thanks,
> > Tim
> >

View raw message