hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "siddharth ramanan (JIRA)" <>
Subject [jira] [Commented] (HIVE-2289) NumberFormatException with respect to _offsets when running a query with index
Date Tue, 19 Jul 2011 17:40:57 GMT


siddharth ramanan commented on HIVE-2289:

@Syed and John, Thanks for helping, it is resolved now. Can you guys point me to a link, with
documentation related to hive indexing? I want to understand, on how the index works, as now
the query without index takes around 12 seconds, and with index takes 11 seconds. I want to
try out with various possibilities to speed up the response time. Like, I want to check, how
is the performance if index is made on a table created with clustered command using buckets..
Do you guys recommend a good documentation for indexing in hive?


> NumberFormatException with respect to _offsets when running a query with  index
> -------------------------------------------------------------------------------
>                 Key: HIVE-2289
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Indexing
>    Affects Versions: 0.7.0
>         Environment: RedHat 5
>            Reporter: siddharth ramanan
> I am having a table named foo with columns origin, destination and information.
> Steps I followed to create index named foosample for foo,
> 1)create index foosample on table foo(origin) as 'compact' with deferred rebuild;
> 2)alter index foosample on foo rebuild;
> 3)insert overwrite directory "/tmp/index_result" select '_bucketname','_offsets' from
default__foo_foosample__ where origin='WAW';
> 4)set hive.index.compact.file=/tmp/index_result;
> 5)set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
> 6)select * from foo where origin='WAW';
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> java.lang.NumberFormatException: For input string: "_offsets"
>     at java.lang.NumberFormatException.forInputString(
>     at java.lang.Long.parseLong(
>     at java.lang.Long.parseLong(
>     at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexResult.add(
>     at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexResult.<init>(
>     at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat.getSplits(
>     at org.apache.hadoop.mapred.JobClient.writeOldSplits(
>     at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>     at org.apache.hadoop.mapred.JobClient.submitJob(
>     at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(
>     at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(
>     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
>     at org.apache.hadoop.hive.ql.Driver.launchTask(
>     at org.apache.hadoop.hive.ql.Driver.execute(
>     at
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(
>     at org.apache.hadoop.hive.cli.CliDriver.main(
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>     at java.lang.reflect.Method.invoke(
>     at org.apache.hadoop.util.RunJar.main(
> Job Submission failed with exception 'java.lang.NumberFormatException(For input string:
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
> Steps 2 and 3 ran a successful mapreduce job and also the table default__foo_foosample__
(index table) has data with three columns origin, _bucketname and _offsets.
> Thanks,
> Siddharth

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message