hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From satyajit vegesna <satyajit.apas...@gmail.com>
Subject Re: Error with flush_length File in Orc, in hive 2.1.0 and mr execution engine.
Date Wed, 26 Oct 2016 23:26:38 GMT
Hi Eugene,

One more observation, in namenode logs when i run select count(*) on
individual tables i still see the same error as before,

org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from
192.168.120.133:47114 Call#4 Retry#0: java.io.FileNotFoundException: File
does not exist: /user/hive/warehouse/mls_public_record_association_
snapshot_orc/delta_0000002_0000002_0000/bucket_00094_flush_length

but i get the count of the tables and they match well with the source data.
So i believe the problem is with joining these tables together.

Any specific logs you want me to debug.

Regards,
Satyajit.

On Wed, Oct 26, 2016 at 4:16 PM, satyajit vegesna <
satyajit.apasprk@gmail.com> wrote:

> Hi Eugene,
>
> select count(*) from mls_public_record_association_snapshot_orc pra  left
> outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = ml.id  left
> outer join attribute a on a.id = ml.standard_status
>
> ran till end and threw the below exception.
>
> MapReduce Total cumulative CPU time: 0 days 1 hours 0 minutes 53 seconds
> 760 msec
> Ended Job = job_1477494091659_0024
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/apache-hive-2.
> 1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.2/
> share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/
> impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.
> Log4jLoggerFactory]
> 2016-10-26 16:09:01 Starting to launch local task to process map join; maximum
> memory = 514850816
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
>   Stage-9
>
> Logs:
>
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.
> exec.mr.MapredLocalTask
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 300  Reduce: 121   Cumulative CPU: 3654.02 sec   HDFS
> Read: 1771032233 HDFS Write: 1917532703 SUCCESS
> Total MapReduce CPU Time Spent: 0 days 1 hours 0 minutes 54 seconds 20 msec
>
> Explain Plan:
>
> STAGE DEPENDENCIES:
>   Stage-8 is a root stage , consists of Stage-1
>   Stage-1
>   Stage-9 depends on stages: Stage-1
>   Stage-3 depends on stages: Stage-9
>   Stage-0 depends on stages: Stage-3
>
> STAGE PLANS:
>   Stage: Stage-8
>     Conditional Operator
>
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: pra
>             Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
>             Select Operator
>               expressions: primary_listing_id (type: string)
>               outputColumnNames: _col0
>               Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
>               Reduce Output Operator
>                 key expressions: _col0 (type: string)
>                 sort order: +
>                 Map-reduce partition columns: _col0 (type: string)
>                 Statistics: Num rows: 99241216 Data size: 9924121600 Basic
> stats: COMPLETE Column stats: NONE
>           TableScan
>             alias: ml
>             Statistics: Num rows: 201432950 Data size: 20949026816 Basic
> stats: COMPLETE Column stats: NONE
>             Select Operator
>               expressions: id (type: string), standard_status (type: int)
>               outputColumnNames: _col0, _col1
>               Statistics: Num rows: 201432950 Data size: 20949026816 Basic
> stats: COMPLETE Column stats: NONE
>               Reduce Output Operator
>                 key expressions: _col0 (type: string)
>                 sort order: +
>                 Map-reduce partition columns: _col0 (type: string)
>                 Statistics: Num rows: 201432950 Data size: 20949026816
> Basic stats: COMPLETE Column stats: NONE
>                 value expressions: _col1 (type: int)
>       Reduce Operator Tree:
>         Join Operator
>           condition map:
>                Left Outer Join0 to 1
>           keys:
>             0 _col0 (type: string)
>             1 _col0 (type: string)
>           outputColumnNames: _col2
>           Statistics: Num rows: 221576249 Data size: 23043929997 Basic
> stats: COMPLETE Column stats: NONE
>           File Output Operator
>             compressed: false
>             table:
>                 input format: org.apache.hadoop.mapred.
> SequenceFileInputFormat
>                 output format: org.apache.hadoop.hive.ql.io.
> HiveSequenceFileOutputFormat
>                 serde: org.apache.hadoop.hive.serde2.
> lazybinary.LazyBinarySerDe
>
>   Stage: Stage-9 --it is failing in the same mapreduce local work.
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         $hdt$_2:a
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         $hdt$_2:a
>           TableScan
>             alias: a
>             Statistics: Num rows: 12830 Data size: 51322 Basic stats:
> COMPLETE Column stats: NONE
>             Select Operator
>               expressions: id (type: int)
>               outputColumnNames: _col0
>               Statistics: Num rows: 12830 Data size: 51322 Basic stats:
> COMPLETE Column stats: NONE
>               HashTable Sink Operator
>                 keys:
>                   0 _col2 (type: int)
>                   1 _col0 (type: int)
>
>   Stage: Stage-3
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             Map Join Operator
>               condition map:
>                    Left Outer Join0 to 1
>               keys:
>                 0 _col2 (type: int)
>                 1 _col0 (type: int)
>               Statistics: Num rows: 243733879 Data size: 25348323546 Basic
> stats: COMPLETE Column stats: NONE
>               Group By Operator
>                 aggregations: count()
>                 mode: hash
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: NONE
>                 Reduce Output Operator
>                   sort order:
>                   Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: NONE
>                   value expressions: _col0 (type: bigint)
>       Local Work:
>         Map Reduce Local Work
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: count(VALUE._col0)
>           mode: mergepartial
>           outputColumnNames: _col0
>           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: NONE
>           File Output Operator
>             compressed: false
>             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: NONE
>             table:
>                 input format: org.apache.hadoop.mapred.
> SequenceFileInputFormat
>                 output format: org.apache.hadoop.hive.ql.io.
> HiveSequenceFileOutputFormat
>                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
>
> Any suggestion in debugging this issue is appreciated.
>
>
> Regards,
> Satyajit.
>
>
>
>
> On Wed, Oct 26, 2016 at 3:34 PM, Eugene Koifman <ekoifman@hortonworks.com>
> wrote:
>
>> If you can run this, then it’s safe to ignore “00094_flush_length” messages
>> and the issue is somewhere else
>>
>> select count(*) from mls_public_record_association_snapshot_orc pra
>>  left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id =
>> ml.id  left outer join attribute a on a.id = ml.standard_status
>>
>> Eugene
>>
>> From: satyajit vegesna <satyajit.apasprk@gmail.com>
>> Date: Wednesday, October 26, 2016 at 2:14 PM
>> To: "user@hive.apache.org" <user@hive.apache.org>, Eugene Koifman <
>> ekoifman@hortonworks.com>
>> Cc: "dev@hive.apache.org" <dev@hive.apache.org>
>> Subject: Re: Error with flush_length File in Orc, in hive 2.1.0 and mr
>> execution engine.
>>
>> Hi Eugene,
>>
>> PFB Transaction table in green and parquet tables in yellow,
>>
>> INSERT INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23")
>> select pra.url as prUrl,pra.url_type as urlType,CAST(pra.created_at AS
>> timestamp) as prCreated, CAST(pra.updated_at AS timestamp) as prUpdated,
>> CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS
>> timestamp) as mlUpdated, a.name as status, pra.public_record_id as prId,
>> acl.accesstime as crawledon, pra.id as propId, pra.primary_listing_id as
>> listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS
>> timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS
>> timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage,
>> datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS
>> timestamp)) as prcreateage, datediff(CAST(acl.accesstime AS
>> timestamp),CAST(pra.updated_at AS timestamp)) as prupdateage,  (case when
>> (pra.public_record_id is not null and TRIM(pra.public_record_id) <> '')
>>  then (case when (pra.primary_listing_id is null or
>> TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END)  else (case
>> when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id)
>> <> '') then 'MLS' else 'UNKNOWN' END) END) as listingType,
>>  acl.httpstatuscode,  acl.httpverb,  acl.requesttime,
>> acl.upstreamheadertime , acl.upstreamresponsetime,  acl.page_id,  useragent
>> AS user_agent,  substring(split(pra.url,'/')[0],
>> 0,length(split(pra.url,'/')[0])-3) as city,
>>  substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as
>> state,  ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner
>> join mls_public_record_association_snapshot_orc pra on acl.listing_url =
>> pra.url  left outer join mls_listing_snapshot_orc ml on
>> pra.primary_listing_id = ml.id  left outer join attribute a on a.id =
>> ml.standard_status  WHERE acl.accesstimedate="2016-10-23";
>>
>>
>> Any clue, or something that you would want me to focus on to debug the
>> issue.
>>
>> Regards,
>> Satyajit.
>>
>>
>>
>> On Tue, Oct 25, 2016 at 8:49 PM, Eugene Koifman <ekoifman@hortonworks.com
>> > wrote:
>>
>>> Which of your tables are are transactional?  Can you provide the DDL?
>>>
>>> I don’t think “File does not exist” error is causing your queries to
>>> fail.  It’s an INFO level msg.
>>> There should be some other error.
>>>
>>> Eugene
>>>
>>>
>>> From: satyajit vegesna <satyajit.apasprk@gmail.com>
>>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
>>> Date: Tuesday, October 25, 2016 at 5:46 PM
>>> To: "user@hive.apache.org" <user@hive.apache.org>, "dev@hive.apache.org"
>>> <dev@hive.apache.org>
>>> Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr
>>> execution engine.
>>>
>>> HI All,
>>>
>>> i am using hive 2.1.0 , hadoop 2.7.2 , but  when i try running queries
>>> like simple insert,
>>>
>>> set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set
>>> hive.exec.dynamic.partition.mode=nonstrict;set
>>> hive.exec.max.dynamic.partitions.pernode=400;set
>>> hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set
>>> mapreduce.reduce.memory.mb=5120;set mapred.tasktracker.map.tasks.maximum=30;set
>>> mapred.tasktracker.reduce.tasks.maximum=20;set
>>> mapred.reduce.child.java.opts=-Xmx2048m;set
>>> mapred.map.child.java.opts=-Xmx2048m; set
>>> hive.support.concurrency=true; set hive.txn.manager=org.apache.ha
>>> doop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=false;
>>> set hive.compactor.worker.threads=1;set mapreduce.job.queuename=default;set
>>> hive.exec.dynamic.partition=true;set hive.exec.dynamic.partition.mode=nonstrict;INSERT
>>> INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url
>>> as prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as
>>> prCreated, CAST(pra.updated_at AS timestamp) as prUpdated,
>>> CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS
>>> timestamp) as mlUpdated, a.name as status, pra.public_record_id as
>>> prId, acl.accesstime as crawledon, pra.id as propId,
>>> pra.primary_listing_id as listingId, datediff(CAST(acl.accesstime AS
>>> timestamp),CAST(ml.created_at AS timestamp)) as mlcreateage,
>>> datediff(CAST(acl.accesstime AS timestamp),CAST(ml.updated_at AS
>>> timestamp)) as mlupdateage, datediff(CAST(acl.accesstime AS
>>> timestamp),CAST(pra.created_at AS timestamp)) as prcreateage,
>>> datediff(CAST(acl.accesstime AS timestamp),CAST(pra.updated_at AS
>>> timestamp)) as prupdateage,  (case when (pra.public_record_id is not null
>>> and TRIM(pra.public_record_id) <> '')  then (case when
>>> (pra.primary_listing_id is null or TRIM(pra.primary_listing_id) = '') then
>>> 'PR' else 'PRMLS' END)  else (case when (pra.primary_listing_id is not null
>>> and TRIM(pra.primary_listing_id) <> '') then 'MLS' else 'UNKNOWN' END)
END)
>>> as listingType,  acl.httpstatuscode,  acl.httpverb,  acl.requesttime,
>>> acl.upstreamheadertime , acl.upstreamresponsetime,  acl.page_id,  useragent
>>> AS user_agent,  substring(split(pra.url,'/')[0],
>>> 0,length(split(pra.url,'/')[0])-3) as city,
>>>  substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as
>>> state,  ml.mls_id  FROM access_logs.loadbalancer_accesslogs acl  inner
>>> join mls_public_record_association_snapshot_orc pra on acl.listing_url
>>> = pra.url  left outer join mls_listing_snapshot_orc ml on
>>> pra.primary_listing_id = ml.id  left outer join attribute a on a.id =
>>> ml.standard_status  WHERE acl.accesstimedate="2016-10-23";
>>>
>>> i finally end up getting below error,
>>>
>>> 2016-10-25 17:40:18,725 Stage-2 map = 100%,  reduce = 52%, Cumulative
>>> CPU 1478.96 sec
>>> 2016-10-25 17:40:19,761 Stage-2 map = 100%,  reduce = 62%, Cumulative
>>> CPU 1636.58 sec
>>> 2016-10-25 17:40:20,794 Stage-2 map = 100%,  reduce = 64%, Cumulative
>>> CPU 1764.97 sec
>>> 2016-10-25 17:40:21,820 Stage-2 map = 100%,  reduce = 69%, Cumulative
>>> CPU 1879.61 sec
>>> 2016-10-25 17:40:22,842 Stage-2 map = 100%,  reduce = 80%, Cumulative
>>> CPU 2051.38 sec
>>> 2016-10-25 17:40:23,872 Stage-2 map = 100%,  reduce = 90%, Cumulative
>>> CPU 2151.49 sec
>>> 2016-10-25 17:40:24,907 Stage-2 map = 100%,  reduce = 93%, Cumulative
>>> CPU 2179.67 sec
>>> 2016-10-25 17:40:25,944 Stage-2 map = 100%,  reduce = 94%, Cumulative
>>> CPU 2187.86 sec
>>> 2016-10-25 17:40:29,062 Stage-2 map = 100%,  reduce = 95%, Cumulative
>>> CPU 2205.22 sec
>>> 2016-10-25 17:40:30,107 Stage-2 map = 100%,  reduce = 100%, Cumulative
>>> CPU 2241.25 sec
>>> MapReduce Total cumulative CPU time: 37 minutes 21 seconds 250 msec
>>> Ended Job = job_1477437520637_0009
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in [jar:file:/opt/apache-hive-2.1
>>> .0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/Static
>>> LoggerBinder.class]
>>> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.2/sh
>>> are/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/im
>>> pl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> explanation.
>>> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4
>>> jLoggerFactory]
>>> 2016-10-25 17:40:35Starting to launch local task to process map join;maximum
>>> memory = 514850816
>>> Execution failed with exit status: 2
>>> Obtaining error information
>>>
>>> Task failed!
>>> Task ID:
>>>   Stage-14
>>>
>>> Logs:
>>>
>>> FAILED: Execution Error, return code 2 from
>>> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
>>> MapReduce Jobs Launched:
>>> Stage-Stage-1: Map: 106  Reduce: 45   Cumulative CPU: 3390.11 sec   HDFS
>>> Read: 8060555201 HDFS Write: 757253756 SUCCESS
>>> Stage-Stage-2: Map: 204  Reduce: 85   Cumulative CPU: 2241.25 sec   HDFS
>>> Read: 2407914653 HDFS Write: 805874953 SUCCESS
>>> Total MapReduce CPU Time Spent: 0 days 1 hours 33 minutes 51 seconds 360
>>> msec
>>>
>>> Could not find any errors in logs, but when i check namenode logs , oi
>>> get the following error,
>>>
>>> 2016-10-25 17:01:51,923 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47114 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00094_flush_length
>>> 2016-10-25 17:01:52,779 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.132:43008 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00095_flush_length
>>> 2016-10-25 17:01:52,984 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47260 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00096_flush_length
>>> 2016-10-25 17:01:53,381 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.132:43090 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00097_flush_length
>>> 2016-10-25 17:01:53,971 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.134:37444 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00098_flush_length
>>> 2016-10-25 17:01:54,092 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 2 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47300 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00099_flush_length
>>> 2016-10-25 17:01:55,094 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 8 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.134:37540 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00012_flush_length
>>> 2016-10-25 17:02:11,269 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 5 on 9000, call org.apache.hadoop.hdfs.protoco
>>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47378 Call#4
>>> Retry#0: java.io.FileNotFoundException: File does not exist:
>>> /user/hive/warehouse/mls_public_record_association_snapshot_
>>> orc/delta_0000002_0000002_0000/bucket_00075_flush_length
>>>
>>> i also search for find the flush_length files in the above mentioned
>>> location, but i only see buckets but no files ending with flush_length.
>>>
>>> Any clue or help would be highly appreciated.
>>>
>>> Regards,
>>> Satyajit.
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message