hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Schulze <b.schu...@ecircle.com>
Subject Re: Keeping Data compressed
Date Fri, 20 Mar 2009 16:48:08 GMT
I found that this error ("Failed with exception Cannot load text files
into a table stored as SequenceFile") happens only, if I switch on
hive.exec.compress.output=true before. Keeping the compression off
imports w/o problem.

to summarize:

t1(text)->t2(text, compressed) works but seems not optimal for MR/HDFS
t1(text)->t2(seq) works, but I miss the compression
t1(seq)->t2(seq, compressed) fails
t1(text, compressed)->t2(seq, compressed) fails

Below the explain. Seems not to be different with or w/o compression.

(Hadoop-19)

Thx,
	Bob

hive> explain from t14 insert overwrite  table t15 select *;
OK
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF t14)) (TOK_INSERT (TOK_DESTINATION
(TOK_TAB t15)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        t14
            Select Operator
              expressions:
                    expr: dt
                    type: string
                    expr: mta
                    type: string
                    expr: caller
                    type: string
                    expr: callerip
                    type: string
                    expr: sender
                    type: string
                    expr: recp
                    type: string
                    expr: port
                    type: int
                    expr: dsn
                    type: string
                    expr: delay
                    type: int
                    expr: attempts
                    type: int
                    expr: relay
                    type: string
                    expr: msgid
                    type: string
              File Output Operator
                compressed: false
                GlobalTableId: 1
                table:
                    input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format:
org.apache.hadoop.mapred.SequenceFileOutputFormat
                    serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    name: t15

  Stage: Stage-0
    Move Operator
      tables:
            replace: true
            table:
                input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                output format:
org.apache.hadoop.mapred.SequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                name: t15

Joydeep Sen Sarma schrieb:
> Can't reproduce this. can u run explain on the insert query and post the results?
> 
> -----Original Message-----
> From: Bob Schulze [mailto:b.schulze@ecircle.com] 
> Sent: Thursday, March 19, 2009 3:05 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Keeping Data compressed
> 
> Repeated it again, it fails in the last step
> 
> 	INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable;
> 
> with the same message:
> 
> ...
> Loading data to table t2
> Failed with exception Cannot load text files into a table stored as
> SequenceFile.
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
> ...
> 
> There is indeed such a check in MoveTask.java. MoveTask seems always to
> be choosen, no matter what I try in the select statement.
> 
> Bob
> 
> Zheng Shao schrieb:
>> Hi Bob,
>>
>> The reason that you see that "Failed with exception Cannot load text
>> files into a table stored as
>> SequenceFile" is because you are trying to load text files into a table
>> declared with "stored as sequencefile".
>>
>> Let me put all the commands that you need together:
>>
>> CREATE TABLE texttable (...) STORED AS TEXTFILE;
>> LOAD DATA ... OVERWRITE INTO texttable;
>> CREATE TABLE seqtable (...) STORED AS SEQUENCEFILE;
>> set hive.exec.compress.output=true;
>> set
>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>> set mapred.output.compression.type=BLOCK;
>> INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable;
>>
>> Let me know if this works or not. If not, please let me know which step
>> goes wrong and the error message.
>>
>> Zheng
>>
>> On Thu, Mar 19, 2009 at 1:34 AM, Bob Schulze <b.schulze@ecircle.com
>> <mailto:b.schulze@ecircle.com>> wrote:
>>
>>     Thx Joydeep,
>>
>>            I actually tried that way; in all combinations (file->seq table,
>>     file->txt table->seq table) I end up with a
>>
>>     "Failed with exception Cannot load text files into a table stored as
>>     SequenceFile.
>>     FAILED: Execution Error, return code 1 from
>>     org.apache.hadoop.hive.ql.exec.MoveTask"
>>
>>     The path you propose _is_ working if if compression is disabled ( I see
>>     then that a sequence file is created in hdfs). Does the compression
>>     setting for hadoop (mapred.compress.map.output=true) possibly conflict
>>     with the hive setting (hive.exec.compress.output=true)?
>>
>>     Beside that I wonder how Hive deals with the key/value records in a
>>     sequence file.
>>
>>     Bob
>>
>>     Joydeep Sen Sarma schrieb:
>>     > Hey - not sure if anyone responded.
>>     >
>>     > Sequencefiles are the way to go if u want parallelism on the files
>>     as well (since gz compressed files cannot be split).
>>     >
>>     > One simple way to do this is to start with text files, build
>>     (potentially an external) table on them - and load them into another
>>     table that is declared to be stored as a sequencefile. the load can
>>     simply be a 'insert overwrite table XXX select * from YYY' on the
>>     first table (YYY). The first table is just a tmp table used to do
>>     the loading.
>>     >
>>     > Whether the data is compressed or not as a result is controlled by
>>     the hive option 'hive.exec.compress.output'. if this is set to true
>>     - the codec used is whatever is dictated by hadoop options that
>>     control the codec. The relevant options are:
>>     >
>>     > mapred.output.compression.codec
>>     > mapred.output.compression.type
>>     >
>>     > u want to set them to org.apache.hadoop.io.compress.GzipCodec and
>>     BLOCK respectively.
>>     >
>>     > Hope this helps,
>>     >
>>     > Joydeep
>>     >
>>     > -----Original Message-----
>>     > From: Bob Schulze [mailto:b.schulze@ecircle.com
>>     <mailto:b.schulze@ecircle.com>]
>>     > Sent: Wednesday, March 18, 2009 8:07 AM
>>     > To: hive-user@hadoop.apache.org <mailto:hive-user@hadoop.apache.org>
>>     > Subject: Keeping Data compressed
>>     >
>>     > Hi,
>>     >
>>     >       I want to keep data in hadoop compressed, ready for
>>     hive-selects to
>>     > access.
>>     >
>>     > Is using sequencefiles with compression the way to go?
>>     >
>>     > How can I get my data into hive tables "as sequencefile", with an
>>     > underlaying compression?
>>     >
>>     > Thx for any ideas,
>>     >
>>     >       Bob
>>     >
>>
>>
>>     --
>>
>>            Bob Schulze
>>            Head Software Development
>>            eCircle AG, Munich, Germany
>>            +49-89-12009-703
>>
>>
>>
>>
>> -- 
>> Yours,
>> Zheng
> 
> 


-- 

	Bob Schulze
	Head Software Development
	eCircle AG, Munich, Germany
	+49-89-12009-703

Mime
View raw message