hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Gatis <igorga...@gmail.com>
Subject Re: How to convert SequenceFile into HFile?
Date Sat, 07 Dec 2013 14:39:03 GMT
Hi JM,

My usage is the following: I want to write a C++ program which will answer
RPC requests. Each request has a list of keys and responses will contain
values. I want to use HFile because it has an efficient key-based index and
because there is a whole set of tools in hadoop to produce this kind of
file.

So, my usage is totally unrelated to HBase. I only have keys and values.
Family and qualifier makes no sense in my design -- specifying empty values
for those is a waste space in my case.

TFile is a replacement for Hadoop's
MapFile<https://issues.apache.org/jira/browse/HADOOP-3315>.
HFile was designed after TFile.

Sounds like TFile better fits my use case then.

-Gatis



On Fri, Dec 6, 2013 at 7:54 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Igor,
>
>
> Have you looked at this constructor?
>
>   /**
>    * Constructs KeyValue structure filled with null value.
>    * @param row - row key (arbitrary byte array)
>    * @param family family name
>    * @param qualifier column qualifier
>    */
>   public KeyValue(final byte [] row, final byte [] family,
>       final byte [] qualifier, final byte [] value)
>
> You need to specify the column family and the column qualifier. That's in
> your table definition. And then you give your value.
>
> Is that not what you are looking for? Also, what is a TFile?
>
> JM
>
>
> 2013/12/6 Igor Gatis <igorgatis@gmail.com>
>
> > Sounds like hbase's HFileOutputFormat depends on KeyValue's "family"
> field.
> > I don't want that.
> >
> > All I want is to keep keys and values in an indexed filed. TFile would
> work
> > as well. But it seems there is no TFileOutputFormat available.
> >
> >
> > On Fri, Dec 6, 2013 at 4:47 PM, Igor Gatis <igorgatis@gmail.com> wrote:
> >
> > > That's the kind of solution I'm looking for.
> > >
> > > Here is what I have:
> > >
> > >     String jobName = "Seq2HFile";
> > >     Job job = new Job(getConf(), jobName);
> > >     job.setJarByClass(Seq2HFile.class);
> > >
> > >     job.setMapperClass(*MyIdentityMapper.class*);
> > >     job.setMapOutputKeyClass(BytesWritable.class);
> > >     job.setMapOutputValueClass(BytesWritable.class);
> > >
> > >     job.setPartitionerClass(TotalOrderPartitioner.class);
> > >
> > >     job.setReducerClass(KeyValueSortReducer.class);
> > >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> > >     job.setOutputValueClass(KeyValue.class);
> > >     job.setNumReduceTasks(1);
> > >
> > >     job.setInputFormatClass(SequenceFileInputFormat.class);
> > >     SequenceFileInputFormat.addInputPaths(job, inputPath);
> > >
> > >     job.setOutputFormatClass(HFileOutputFormat.class);
> > >     HFileOutputFormat.setOutputPath(job, new Path(outputPath));
> > >
> > >     job.submit();
> > >     job.waitForCompletion(true);
> > >
> > > The bit I'm stuck is MyIdentityMapper. My input is a
> > > SequenceFile<BytesWritable, BytesWritable>. According to
> > HFileOutputFormat
> > > signature, output key is ImmutableBytesWritable and value is KeyValue.
> > >
> > > I guess BytesWritable -> ImmutableBytesWritable is straightforward. But
> > > I've got no clue how to fill KeyValue.
> > >
> > >   public static class MyIdentityMapper
> > >       extends Mapper<BytesWritable, BytesWritable,
> > ImmutableBytesWritable,
> > > KeyValue> {
> > >     public void map(BytesWritable key, BytesWritable value, Context
> > > context) throws IOException,
> > >         InterruptedException {
> > > *      // What do I write here?*
> > >     }
> > >   }
> > >
> > >
> > >
> > > On Fri, Dec 6, 2013 at 12:31 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > >> Hi Igor,
> > >>
> > >> I will say, MapReduce.
> > >>
> > >> SequenceFileInputFormat
> > >> HFileOutputFormat
> > >>
> > >> JM
> > >>
> > >>
> > >> 2013/12/5 Igor Gatis <igorgatis@gmail.com>
> > >>
> > >> > I have SequenceFiles I'd like to convert to HFile. How do I that?
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message