hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MIS <misapa...@gmail.com>
Subject Re: Accessing individual columns from a Hive table which is row delimited by RegexSerde
Date Fri, 09 Sep 2011 10:47:53 GMT
Exactly what I meant. Can some one please let me know the right way of doing
or am I missing something ?

-MIS.

On Fri, Sep 9, 2011 at 4:13 PM, Adriaan Tijsseling
<adriaan@tijsseling.com>wrote:

> I can replicate it using this example. A
> SELECT * FROM serde_regex;
> works, but
> SELECT host FROM serde_regex;
> doesn't:
>
> hive> SELECT host FROM serde_regex;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201109091148_0003, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201109091148_0003
> Kill Command = /Users/adriaant/CodeBox/hadoop/libexec/../bin/hadoop job
>  -Dmapred.job.tracker=localhost:9001 -kill job_201109091148_0003
> 2011-09-09 12:40:28,638 Stage-1 map = 0%,  reduce = 0%
> 2011-09-09 12:40:58,883 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201109091148_0003 with errors
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> Hive log:
> hive.log
> 2011-09-09 12:40:18,907 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2011-09-09 12:40:18,907 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2011-09-09 12:40:18,911 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2011-09-09 12:40:18,911 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2011-09-09 12:40:18,912 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2011-09-09 12:40:18,912 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2011-09-09 12:40:21,326 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(624)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 2011-09-09 12:40:58,886 ERROR exec.MapRedTask
> (SessionState.java:printError(343)) - Ended Job = job_201109091148_0003 with
> errors
> 2011-09-09 12:40:58,893 ERROR ql.Driver (SessionState.java:printError(343))
> - FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> Job log:
> SessionStart SESSION_ID="adriaant_201109091240" TIME="1315564815373"
> QueryStart QUERY_STRING="SELECT host FROM serde_regex"
> QUERY_ID="adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb"
> TIME="1315564821088"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}"
> TIME="1315564821100"
> TaskStart TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask"
> TASK_ID="Stage-1"
> QUERY_ID="adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb"
> TIME="1315564821101"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}"
> TIME="1315564821104"
> TaskProgress TASK_HADOOP_PROGRESS="2011-09-09 12:40:28,638 Stage-1 map =
> 0%,  reduce = 0%" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask"
> TASK_COUNTERS="org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0,Job
> Counters .SLOTS_MILLIS_MAPS:4309,Job Counters .Launched map tasks:2,Job
> Counters .Data-local map tasks:2" TASK_ID="Stage-1"
> QUERY_ID="adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb"
> TASK_HADOOP_ID="job_201109091148_0003" TIME="1315564828639"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"0","CNTR_NAME_Stage-1_MAP_PROGRESS":"0"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"false","started":"false"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"false","started":"false"}],"done":"false","started":"false"}],"done":"false","started":"true"}],"done":"false","started":"true"}"
> TIME="1315564828639"
> TaskProgress TASK_HADOOP_PROGRESS="2011-09-09 12:40:58,883 Stage-1 map =
> 100%,  reduce = 100%" TASK_NAME="org.apache.hadoop.hive.ql.exec.MapRedTask"
> TASK_COUNTERS="org.apache.hadoop.hive.ql.exec.Operator$ProgressCounter.CREATED_FILES:0,Job
> Counters .SLOTS_MILLIS_MAPS:52534,Job Counters .Total time spent by all
> reduces waiting after reserving slots (ms):0,Job Counters .Total time spent
> by all maps waiting after reserving slots (ms):0,Job Counters .Launched map
> tasks:8,Job Counters .Data-local map tasks:8,Job Counters
> .SLOTS_MILLIS_REDUCES:0,Job Counters .Failed map tasks:1" TASK_ID="Stage-1"
> QUERY_ID="adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb"
> TASK_HADOOP_ID="job_201109091148_0003" TIME="1315564858883"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}],"done":"false","started":"true"}"
> TIME="1315564858884"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}"
> TIME="1315564858886"
> Counters
> plan="{"queryId":"adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb","queryType":null,"queryAttributes":{"queryString":"SELECT
> host FROM
> serde_regex"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-1","stageType":"MAPRED","stageAttributes":"null","stageCounters":{"CNTR_NAME_Stage-1_REDUCE_PROGRESS":"100","CNTR_NAME_Stage-1_MAP_PROGRESS":"100"},"taskList":[{"taskId":"Stage-1_MAP","taskType":"MAP","taskAttributes":"null","taskCounters":"null","operatorGraph":{"nodeType":"OPERATOR","roots":"null","adjacencyList":[{"node":"TS_0","children":["SEL_1"],"adjacencyType":"CONJUNCTIVE"},{"node":"SEL_1","children":["FS_2"],"adjacencyType":"CONJUNCTIVE"}]},"operatorList":[{"operatorId":"TS_0","operatorType":"TABLESCAN","operatorAttributes":"null","operatorCounters":"}","done":"true","started":"true"},{"operatorId":"SEL_1","operatorType":"SELECT","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"},{"operatorId":"FS_2","operatorType":"FILESINK","operatorAttributes":"null","operatorCounters":"null","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}],"done":"false","started":"true"}"
> TIME="1315564858893"
> QueryEnd QUERY_STRING="SELECT host FROM serde_regex"
> QUERY_ID="adriaant_20110909124040_b1fdc58d-aade-4c12-8cdd-0423a0f6e3cb"
> QUERY_NUM_TASKS="1" TIME="1315564858893"
>
>
> On 2011/09/09, at 11:46, MIS wrote:
>
> > The issue can be reproduced by following the example in contrib:
> > *
> > hive/contrib/src/test/queries/clientpositive/serde_regex.q*
> >
> > sample log data can be obtained from there itself.
> >
> > By the way, my version of hive is 0.7.0 , hadoop 0.20.2
> >
> > Thanks,
> > MIS.
> >
> >
> >
> > On Fri, Sep 9, 2011 at 2:55 PM, Ankit Jain <ankitjaincs06@gmail.com>
> wrote:
> >
> >> can you provide me sample log data
> >>
> >> Thanks,
> >> Ankit
> >>
> >>
> >> On Fri, Sep 9, 2011 at 2:33 AM, MIS <misapache@gmail.com> wrote:
> >>
> >>> I want to access individual columns from a table created with row
> >>> delimited by RegexSerde.
> >>>
> >>> For example,
> >>> I have created a table as below:
> >>>
> >>> create table test ( col1 STRING, col2 STRING )
> >>>  ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> >>>  WITH SERDEPROPERTIES ("input.regex" = "([^ ]*) ([^ ]*)",
> "output.format.string" = "%$1s %$2s");
> >>>
> >>> After loading valid data,
> >>> *select * from test* workd fine,
> >>>
> >>> however *select col1 from test* or *select col2 from test* or *select
> >>> col1, col2 from test* fail.
> >>>
> >>> how do I achieve the above.
> >>>
> >>> Thanks,
> >>> MIS.
> >>>
> >>
> >>
>
>

Mime
View raw message