hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Best way to Import data from Cassandra to HBase
Date Tue, 14 Jun 2011 20:36:54 GMT
Also, you might want to look at HBASE-3880, which is committed but not
released yet. It allows you to specify a custom Mapper class when running
ImportTsv. It seems like a similar patch to make the input format plug-able
would be needed in your case though.


On Tue, Jun 14, 2011 at 9:53 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi,
>
> Unfortunately I don't think the importtsv will work in "local job runner"
> mode. Try runnign it on an MR cluster (could be pseudo-distributed)
>
> -Todd
>
> On Tue, Jun 14, 2011 at 2:01 AM, King JKing <beuking@gmail.com> wrote:
>
> > Thank for your reply.
> >
> > I just test importtsv and have Warning:
> >
> > java.lang.IllegalArgumentException: Can't read partitions file
> > at
> >
> >
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
> > at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > at
> >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > at
> >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > Caused by: java.io.FileNotFoundException: File _partition.lst does not
> > exist.
> > at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> > at
> >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
> > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676)
> > at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
> > at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
> > at
> >
> >
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
> > at
> >
> >
> org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
> > ... 6 more
> >
> > Here is my command line:
> > ./hadoop jar hbase-0.90.0.jar importtsv
> > -Dimporttsv.columns=HBASE_ROW_KEY,f1:b,f1:c
> -Dimporttsv.bulk.output=output
> > t1 input
> >
> > In that, 't1', 'f1' is table and family in HBase.
> >
> > No data write in 'output' folder.
> >
> > Could you give me some advice?
> >
> > Thank you in advance.
> >
> > On Tue, Jun 14, 2011 at 10:44 AM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> > > On Mon, Jun 13, 2011 at 8:17 PM, King JKing <beuking@gmail.com> wrote:
> > >
> > > > Dear all,
> > > >
> > > > I want to import data from Cassandra to HBase.
> > > >
> > > >
> > > That's what we like to hear! ;-)
> > >
> > >
> > > > I think the way maybe:
> > > > Customize ImportTsv.java for read Cassandra data file (*.dbf) and
> > convert
> > > > to HBase data files, and use completebulkload tool
> > > >
> > > >
> > > Sounds about right. I don't know what the .dbf format is, but if you
> can
> > > make an InputFormat that supports them, you can write a mapper to
> > translate
> > > from those records into HBase Puts, and then use HFileOutputFormat and
> > bulk
> > > loads just like ImportTsv.
> > >
> > > -Todd
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message