hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Question on Reading Sequencefile data using mapreduce
Date Tue, 14 Apr 2015 16:02:31 GMT
Heya devs,

Looking for an appropriate answer for this question, it seems we don't have
a public Cell implementation that can be used for generating HFiles. How
are folks expected to generate Cells instances if KeyValue is
@InterfaceAudience.Private ?

-n

On Tue, Apr 14, 2015 at 9:00 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> What version of HBase are you using?
>
> In recent versions, HFileOutputFormat is a deprecated class (replaced by
> HFileOutputFormat2), and KeyValue is an internal API (see the class
> annotation @InterfaceAudience.Private, basically, use at your own peril).
> The javadoc on the KeyValue constructor you're using says "Creates a
> KeyValue from the start of the specified byte array. *Presumes bytes
> content is formatted as a KeyValue blob*." (emphasis my own) It looks
> like the value you're using for bytes is not in the KeyValue blob format.
>
> Try instead one of the other KeyValue constructors, such as
> KeyValue(byte[] row, byte[] family, byte[] qualifier, byte[] value).
>
> Thanks,
> Nick
>
>
> On Thu, Apr 9, 2015 at 12:23 PM, yuantao peng <pengyuantao@gmail.com>
> wrote:
>
>> I am learning how to upload binary data to HBASE using mapreduce.   Here
>> are the steps I am following assuming my binary file is testlist
>> (1) wrote a sequencefilewrite.java to read the local testlist file and
>> save
>> a sequence file to HDFS
>> (2) wrote a MapReduce program to read the generated sequence file, and
>> generate a HFile
>> (3) bulk import this HFile to HBASE
>>
>> I am stuck at step (2) as I keep getting exception.  I am absolutely new
>> to
>> hadoop/hbase,   code is posted below,  any comments or suggestions are
>> appreciated!!!
>>
>> Sequencewrite.java is like this:
>>
>> public class SequenceFileWrite
>> {
>>     public static void main(String[] args) throws IOException {
>>     String uri = args[1];
>>     Configuration conf = new Configuration();
>>     FileSystem fs = FileSystem.get(conf);
>>     Path path = new Path(uri);
>>     File infile = new File(args[0]);
>>     SequenceFile.Writer writer = null;
>>     try
>>     {
>>
>>       BytesWritable key, value;
>>       writer = SequenceFile.createWriter(fs, conf,  path,
>> BytesWritable.class, BytesWritable.class);
>>       FileInputStream fin = new FileInputStream(infile);
>>       for(int i=0; i<10; ++i) {
>>         key   = new BytesWritable();
>>         value = new BytesWritable();
>>         byte[] keybuf = new byte[2];
>>         byte[] valbuf = new byte[2];
>>         fin.read(keybuf);
>>         fin.read(valbuf);
>>         key.set(keybuf,0,2);
>>         value.set(valbuf,0,2);
>>         writer.append(key,value);
>>       }
>>     } finally {
>>            IOUtils.closeStream(writer);
>>         }
>>     }
>> }
>>
>> And my mapper is like this:
>>
>> public class HBaseTkrHdrMapper extends Mapper<BytesWritable,
>> BytesWritable,
>> ImmutableBytesWritable, KeyValue> {
>>
>>   int tipOffSeconds = 0;
>>   String tableName = "";
>>
>>   ImmutableBytesWritable hKey = new ImmutableBytesWritable();
>>   KeyValue kv;
>>
>>   @Override
>>   protected void setup(Context context) throws IOException,
>>  InterruptedException {
>>     Configuration c = context.getConfiguration();
>>     tipOffSeconds   = c.getInt("epoch.seconds.tipoff", 0);
>>     tableName       = c.get("hbase.table.mrtest");
>>   }
>>
>>   @Override
>>   protected void map(BytesWritable key, BytesWritable value, Context
>> context)  throws IOException, InterruptedException {
>>     ImmutableBytesWritable hkey = new
>> ImmutableBytesWritable(key.getBytes());
>>     KeyValue               hval = new KeyValue(value.getBytes());
>>     context.write(hkey, hval);
>>   }
>> }
>>
>> Driver code is as follows:
>>
>> public class Driver {
>>   public static void main(String[] args) throws Exception {
>>     Configuration conf = new Configuration();
>>     args = new GenericOptionsParser(conf, args).getRemainingArgs();
>>
>>     @SuppressWarnings("deprecation")
>>     Job job = new Job(conf, "Bulk Import");
>>     job.setJarByClass(HBaseTkrHdrMapper.class);
>>
>>     job.setMapperClass(HBaseTkrHdrMapper.class);
>>     job.setMapOutputKeyClass(ImmutableBytesWritable.class);
>>     job.setMapOutputValueClass(KeyValue.class);
>>     job.setInputFormatClass(SequenceFileInputFormat.class);
>>
>>     HTable hTable = new HTable(conf, args[2]);
>>
>>     // Auto configure partitioner and reducer
>>     HFileOutputFormat.configureIncrementalLoad(job, hTable);
>>
>>     FileInputFormat.addInputPath(job, new Path(args[0]));
>>     FileOutputFormat.setOutputPath(job, new Path(args[1]));
>>
>>     job.waitForCompletion(true);
>>   }
>> }
>>
>>
>> The exception I got is :
>>
>>
>> Error: java.lang.IllegalArgumentException: offset (0) + length (4) exceed
>> the capacity of the array: 3
>>         at
>>
>> org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:602)
>>         at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:751)
>>         at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:737)
>>         at org.apache.hadoop.hbase.KeyValue.getLength(KeyValue.java:972)
>>         at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:276)
>>         at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:265)
>>         at
>>
>> com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:41)
>>         at
>>
>> com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:23)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>
>>
>> Exception in thread "main" java.io.IOException:
>> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
>> java.lang.NullPointerException
>>         at
>>
>> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
>>         at
>>
>> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
>>         at
>>
>> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
>>         at
>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>         at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message