Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AC06210472 for ; Fri, 31 Jan 2014 00:13:15 +0000 (UTC) Received: (qmail 94581 invoked by uid 500); 31 Jan 2014 00:13:13 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 94490 invoked by uid 500); 31 Jan 2014 00:13:12 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 94482 invoked by uid 99); 31 Jan 2014 00:13:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jan 2014 00:13:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jan 2014 00:13:06 +0000 Received: by mail-wg0-f48.google.com with SMTP id x13so7525862wgg.3 for ; Thu, 30 Jan 2014 16:12:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+McE21owBQRIQTK/HpOk29B5VbCsqiLJtsj0AwU17qQ=; b=DN4tsFy4r8tYsOeXpQzeYnHB680t5NqsJ8iee64P12Tm28G6kVc9cvNvGACgEQXtYl Vhs/n+omw1MCzsYI4OXXywVvyiqaxzKEJrv6qcw1vfmf6x88V96BNI6ovtNXmGmN6CzX Gv7aFtdML9NlAr2ZIBW1IyYg9yi/ZuyP4GzPoDK7LYwbJE5mcNUxDCq1qb7L1g/Jah/P nP7lKo6jhAkrGB1mJqwwN6K7qmxgt+Sy1jcOgsIIRElNKf91BJ4Pua1hiWV1CvOAwyrM xEM3EqIpfg4MhAkGSjOzJuTmh/4IhlqJr/prZ3WKKm5smPtHSAkGYniIIA0SJdUD0e41 qoZw== MIME-Version: 1.0 X-Received: by 10.180.79.73 with SMTP id h9mr11702280wix.3.1391127166146; Thu, 30 Jan 2014 16:12:46 -0800 (PST) Received: by 10.194.220.105 with HTTP; Thu, 30 Jan 2014 16:12:46 -0800 (PST) In-Reply-To: References: Date: Thu, 30 Jan 2014 19:12:46 -0500 Message-ID: Subject: Re: Issue with Hive and table with lots of column From: Edward Capriolo To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=f46d041825e293601704f1390887 X-Virus-Checked: Checked by ClamAV on apache.org --f46d041825e293601704f1390887 Content-Type: text/plain; charset=ISO-8859-1 Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague wrote: > oh. thinking some more about this i forgot to ask some other basic questions. > > a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? "show create table " would yield that. > > b) what command is causing the stack trace? > > my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the "row" being returned and hence the size of the internal ArrayList. OTOH, if you're using "select *", um, you have my sympathies. :) > > > > > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague wrote: > > thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. > > So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) > > two things to try. > > 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like "HIVE_HEAPSIZE". :) Anyway, can't hurt to try. > > 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. > > 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. > > > That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) > > cheers, > Stephen. > > > > On Thu, Jan 30, 2014 at 2:32 AM, David Gayou wrote: > > We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 > Our hadoop version is 1.1.2. > Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each > > The error message i get is : > > 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2734) > at java.util.ArrayList.ensureCapacity(ArrayList.java:167) > at java.util.ArrayList.add(ArrayList.java:351) > at org.apache.hive.service.cli.Row.(Row.java:47) > at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) > at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) > at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) > at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) > at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) > at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) > at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) > at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) > at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) > at java.security.AccessCont -- Sorry this was sent from mobile. Will do less grammar and spell check than usual. --f46d041825e293601704f1390887 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ok here are the problem(s). Thrift has frame size limits, thrift has to buf= fer rows into memory.

Hove thrift has a heap size, it needs to big = in this case.

Your client needs a big heap size as well.

The = way to do this query if it is possible may be turning row lateral, potwntia= lly by treating it as a list, it will make queries on it awkward.

Good luck

On Thursday, January 30, 2014, Stephen Sprague <spragues@gmail.com> wrote:
>= ; oh. thinking some more about this i forgot to ask some other basic questi= ons.=A0
>
> a) what storage format are you using for the table (text, sequ= ence, rcfile, orc or custom)? =A0 "show create table <table>&quo= t; would yield that.
>
> b) what command is causing the stack t= race?
>
> my thinking here is rcfile and orc are column based (i think) = and if you don't select all the columns that could very well limit the = size of the "row" being returned and hence the size of the intern= al ArrayList.=A0 OTOH, if you're using "select *", um, you ha= ve my sympathies. :)
>
>
>
>
> On Thu, Jan 30, 2014 at 11:33 AM, Step= hen Sprague <spragues@gmail.com> wrote:
>
> thanks for the information. Up-to-date hive. C= luster on the smallish side. And, well, sure looks like a memory issue. :)= =A0 rather than an inherent hive limitation that is.=A0
>
> So.=A0 I can only speak as a user (ie. not a hive developer) b= ut what i'd be interested in knowing next is is this via running hive i= n local mode, correct? (eg. not through hiveserver1/2).=A0 And it looks lik= e it boinks on array processing which i assume to be internal code arrays a= nd not hive data arrays - your 15K columns are all scalar/simple types, cor= rect?=A0 Its clearly fetching results and looks be trying to store them in = a java array=A0 - and not just one row but a *set* of rows (ArrayList)
>
> two things to try.=A0
>
> 1. boost the heap-size. = try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that= . I woulda hoped it was called something like "HIVE_HEAPSIZE". :)= =A0 Anyway, can't hurt to try.=A0=A0
>
> 2. trim down the number of columns and see where the breaking = point is.=A0 is it 10K? is it 5K?=A0=A0 The idea is to confirm its _the num= ber of columns_ that is causing the memory to blow and not some other artif= act unbeknownst to us.
>
> 3. Google around the Hive namespace for something that might l= imit or otherwise control the number of rows stored at once in Hive's i= nternal buffer. I snoop around too.
>
> =A0
> That's = all i got for now and maybe we'll get lucky and someone on this list wi= ll know something or another about this. :)
>
> cheers,
> Stephen.
> =A0
>
>
> O= n Thu, Jan 30, 2014 at 2:32 AM, David Gayou <
david.gayou@kxen.com> wrote:
>
> We are usi= ng the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0= .10.0
> Our hadoop version is 1.1.2.
> Our cluster is 1 master + 4 slave= s with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + = 16Gb Ram each
>
> The error message i get is :
>
> = 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:= process(41)) - Internal error processing FetchResults
> java.lang.OutOfMemoryError: Java heap space
> =A0=A0=A0=A0=A0=A0= =A0 at java.util.Arrays.copyOf(Arrays.java:2734)
> =A0=A0=A0=A0=A0=A0= =A0 at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> =A0= =A0=A0=A0=A0=A0=A0 at java.util.ArrayList.add(ArrayList.java:351)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.Row.<init>(= Row.java:47)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.R= owSet.addRow(RowSet.java:61)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hi= ve.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235)<= br> > =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.operation.Operati= onManager.getOperationNextRowSet(OperationManager.java:170)
> =A0=A0= =A0=A0=A0=A0=A0 at org.apache.hive.service.cli.session.HiveSessionImpl.fetc= hResults(HiveSessionImpl.java:417)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.CLIService.fetchR= esults(CLIService.java:306)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hiv= e.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:38= 6)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.thrift.TCLI= Service$Processor$FetchResults.getResult(TCLIService.java:1373)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.cli.thrift.TCLIServic= e$Processor$FetchResults.getResult(TCLIService.java:1358)
> =A0=A0=A0= =A0=A0=A0=A0 at org.apache.thrift.ProcessFunction.process(ProcessFunction.j= ava:39)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.thrift.TBaseProcessor.p= rocess(TBaseProcessor.java:39)
> =A0=A0=A0=A0=A0=A0=A0 at org.apache.hive.service.auth.TUGIContainingPr= ocessor$1.run(TUGIContainingProcessor.java:58)
> =A0=A0=A0=A0=A0=A0= =A0 at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIConta= iningProcessor.java:55)
> =A0=A0=A0=A0=A0=A0=A0 at java.security.AccessCont

--
Sorry = this was sent from mobile. Will do less grammar and spell check than usual.=
--f46d041825e293601704f1390887--