hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Question on Reading Sequencefile data using mapreduce
Date Thu, 16 Apr 2015 00:08:25 GMT
Of course. Thanks Gary.

On Tue, Apr 14, 2015 at 11:18 AM, Gary Helmling <ghelmling@gmail.com> wrote:

> I believe the recommended approach would be to use CellUtil.  It is marked
> as Public/Evolving and exposes a number of static createCell() methods that
> will generate Cell instances for different combinations of parameters.
>
> On Tue, Apr 14, 2015 at 9:04 AM Nick Dimiduk <ndimiduk@gmail.com> wrote:
>
> > Heya devs,
> >
> > Looking for an appropriate answer for this question, it seems we don't
> have
> > a public Cell implementation that can be used for generating HFiles. How
> > are folks expected to generate Cells instances if KeyValue is
> > @InterfaceAudience.Private ?
> >
> > -n
> >
> > On Tue, Apr 14, 2015 at 9:00 AM, Nick Dimiduk <ndimiduk@gmail.com>
> wrote:
> >
> > > What version of HBase are you using?
> > >
> > > In recent versions, HFileOutputFormat is a deprecated class (replaced
> by
> > > HFileOutputFormat2), and KeyValue is an internal API (see the class
> > > annotation @InterfaceAudience.Private, basically, use at your own
> peril).
> > > The javadoc on the KeyValue constructor you're using says "Creates a
> > > KeyValue from the start of the specified byte array. *Presumes bytes
> > > content is formatted as a KeyValue blob*." (emphasis my own) It looks
> > > like the value you're using for bytes is not in the KeyValue blob
> format.
> > >
> > > Try instead one of the other KeyValue constructors, such as
> > > KeyValue(byte[] row, byte[] family, byte[] qualifier, byte[] value).
> > >
> > > Thanks,
> > > Nick
> > >
> > >
> > > On Thu, Apr 9, 2015 at 12:23 PM, yuantao peng <pengyuantao@gmail.com>
> > > wrote:
> > >
> > >> I am learning how to upload binary data to HBASE using mapreduce.
>  Here
> > >> are the steps I am following assuming my binary file is testlist
> > >> (1) wrote a sequencefilewrite.java to read the local testlist file and
> > >> save
> > >> a sequence file to HDFS
> > >> (2) wrote a MapReduce program to read the generated sequence file, and
> > >> generate a HFile
> > >> (3) bulk import this HFile to HBASE
> > >>
> > >> I am stuck at step (2) as I keep getting exception.  I am absolutely
> new
> > >> to
> > >> hadoop/hbase,   code is posted below,  any comments or suggestions are
> > >> appreciated!!!
> > >>
> > >> Sequencewrite.java is like this:
> > >>
> > >> public class SequenceFileWrite
> > >> {
> > >>     public static void main(String[] args) throws IOException {
> > >>     String uri = args[1];
> > >>     Configuration conf = new Configuration();
> > >>     FileSystem fs = FileSystem.get(conf);
> > >>     Path path = new Path(uri);
> > >>     File infile = new File(args[0]);
> > >>     SequenceFile.Writer writer = null;
> > >>     try
> > >>     {
> > >>
> > >>       BytesWritable key, value;
> > >>       writer = SequenceFile.createWriter(fs, conf,  path,
> > >> BytesWritable.class, BytesWritable.class);
> > >>       FileInputStream fin = new FileInputStream(infile);
> > >>       for(int i=0; i<10; ++i) {
> > >>         key   = new BytesWritable();
> > >>         value = new BytesWritable();
> > >>         byte[] keybuf = new byte[2];
> > >>         byte[] valbuf = new byte[2];
> > >>         fin.read(keybuf);
> > >>         fin.read(valbuf);
> > >>         key.set(keybuf,0,2);
> > >>         value.set(valbuf,0,2);
> > >>         writer.append(key,value);
> > >>       }
> > >>     } finally {
> > >>            IOUtils.closeStream(writer);
> > >>         }
> > >>     }
> > >> }
> > >>
> > >> And my mapper is like this:
> > >>
> > >> public class HBaseTkrHdrMapper extends Mapper<BytesWritable,
> > >> BytesWritable,
> > >> ImmutableBytesWritable, KeyValue> {
> > >>
> > >>   int tipOffSeconds = 0;
> > >>   String tableName = "";
> > >>
> > >>   ImmutableBytesWritable hKey = new ImmutableBytesWritable();
> > >>   KeyValue kv;
> > >>
> > >>   @Override
> > >>   protected void setup(Context context) throws IOException,
> > >>  InterruptedException {
> > >>     Configuration c = context.getConfiguration();
> > >>     tipOffSeconds   = c.getInt("epoch.seconds.tipoff", 0);
> > >>     tableName       = c.get("hbase.table.mrtest");
> > >>   }
> > >>
> > >>   @Override
> > >>   protected void map(BytesWritable key, BytesWritable value, Context
> > >> context)  throws IOException, InterruptedException {
> > >>     ImmutableBytesWritable hkey = new
> > >> ImmutableBytesWritable(key.getBytes());
> > >>     KeyValue               hval = new KeyValue(value.getBytes());
> > >>     context.write(hkey, hval);
> > >>   }
> > >> }
> > >>
> > >> Driver code is as follows:
> > >>
> > >> public class Driver {
> > >>   public static void main(String[] args) throws Exception {
> > >>     Configuration conf = new Configuration();
> > >>     args = new GenericOptionsParser(conf, args).getRemainingArgs();
> > >>
> > >>     @SuppressWarnings("deprecation")
> > >>     Job job = new Job(conf, "Bulk Import");
> > >>     job.setJarByClass(HBaseTkrHdrMapper.class);
> > >>
> > >>     job.setMapperClass(HBaseTkrHdrMapper.class);
> > >>     job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> > >>     job.setMapOutputValueClass(KeyValue.class);
> > >>     job.setInputFormatClass(SequenceFileInputFormat.class);
> > >>
> > >>     HTable hTable = new HTable(conf, args[2]);
> > >>
> > >>     // Auto configure partitioner and reducer
> > >>     HFileOutputFormat.configureIncrementalLoad(job, hTable);
> > >>
> > >>     FileInputFormat.addInputPath(job, new Path(args[0]));
> > >>     FileOutputFormat.setOutputPath(job, new Path(args[1]));
> > >>
> > >>     job.waitForCompletion(true);
> > >>   }
> > >> }
> > >>
> > >>
> > >> The exception I got is :
> > >>
> > >>
> > >> Error: java.lang.IllegalArgumentException: offset (0) + length (4)
> > exceed
> > >> the capacity of the array: 3
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:602)
> > >>         at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:751)
> > >>         at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:737)
> > >>         at
> org.apache.hadoop.hbase.KeyValue.getLength(KeyValue.java:972)
> > >>         at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:276)
> > >>         at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:265)
> > >>         at
> > >>
> > >>
> >
> com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:41)
> > >>         at
> > >>
> > >>
> >
> com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:23)
> > >>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> > >>         at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> > >>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
> > >>         at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> > >>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> > >>
> > >>
> > >> Exception in thread "main" java.io.IOException:
> > >> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
> > >> java.lang.NullPointerException
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> > >>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> > >>         at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
> > >>         at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at javax.security.auth.Subject.doAs(Subject.java:415)
> > >>         at
> > >>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> > >>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message