hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: local bulk loading?
Date Fri, 27 Apr 2012 12:13:21 GMT

Doh!  Thanks Dave and JD.  I'll update the RefGuide with this fact.





On 4/26/12 6:32 PM, "Jean-Daniel Cryans" <jdcryans@apache.org> wrote:

>Yep same old problem that was asked a bunch of time on the user list :)
>
>On Thu, Apr 26, 2012 at 3:29 PM, Dave Revell <dave@urbanairship.com>
>wrote:
>> Hi Doug,
>>
>> When I hit this problem, I concluded that HFileOutputFormat cannot be
>>used
>> in standalone mode since it requires DistributedCache, which doesn't
>>work
>> with the local job runner.
>>
>> So you're not the only one :(
>>
>> -Dave
>>
>> On Thu, Apr 26, 2012 at 1:52 PM, Doug Meil
>><doug.meil@explorysmedical.com>wrote:
>>
>>>
>>> Hi Devs-
>>>
>>> I'm coding up a local bulkloading example for the RefGuide but I've
>>>been
>>> banging my head on thisŠ.
>>>
>>>
>>>  WARN [Thread-8] (LocalJobRunner.java:295) - job_local_0001
>>>
>>> java.lang.IllegalArgumentException: Can't read partitions file
>>>
>>> at
>>> 
>>>org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.s
>>>etConf(TotalOrderPartitioner.java:111)
>>>
>>> at 
>>>org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>>
>>> at
>>> 
>>>org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
>>>117)
>>>
>>> at
>>> 
>>>org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:
>>>552)
>>>
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
>>>
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
>>>
>>> at 
>>>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>
>>> Caused by: java.io.FileNotFoundException: File _partition.lst does not
>>> exist.
>>>
>>> at
>>> 
>>>org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem
>>>.java:372)
>>>
>>> at
>>> 
>>>org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.jav
>>>a:251)
>>>
>>> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:751)
>>>
>>> at 
>>>org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>
>>> at 
>>>org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>>>
>>> at
>>> 
>>>org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.r
>>>eadPartitions(TotalOrderPartitioner.java:296)
>>>
>>> at
>>> 
>>>org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.s
>>>etConf(TotalOrderPartitioner.java:82)
>>>
>>> Š does bulk loading work with the local job runner?  Obviously, you're
>>>not
>>> going to run a production cluster off your laptop but it's nice to at
>>>least
>>> be able to test your code.
>>>
>>> I know the DistributedCache doesn't work with the LocalJobRunner (and
>>> TotalOrderPartitioner uses the DistributedCache) and then there's this
>>>log
>>> message..
>>>
>>>
>>>  WARN [main] (LocalJobRunner.java:134) - LocalJobRunner does not
>>>support
>>> symlinking into current working dir.
>>>
>>> Š so I'm wondering how this actually works, if it does work locally.
>>>
>>> Coincidentally, this exact error is in the troubleshooting chapter..
>>>
>>> http://hbase.apache.org/book.html#trouble.mapreduce
>>>
>>> Š but it came up in a different context.  In the context that the guy
>>>was
>>> asking the question he thought he was remote, but he was really local.
>>>
>>> Doug Meil
>>> Chief Software Architect, Explorys
>>> doug.meil@explorys.com
>>>
>>>
>



Mime
View raw message