hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Java Null Pointer Exception!
Date Mon, 19 Aug 2013 14:11:26 GMT
I think you should not try to join the tables this way. It will be against
the recommended design/pattern of HBase (joins in HBase alone go against
the design) and M/R. You should first, maybe through another M/R job or PIg
script, for example, pre-process data and massage it into a uniform or
appropriate structure conforming to the M/R architecture (maybe convert
them into ext files first?) Have you looked into the recommended M/R join
strategies?

Some links to start with:

http://codingjunkie.net/mapreduce-reduce-joins/
http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/
http://blog.matthewrathbone.com/2013/02/09/real-world-hadoop-implementing-a-left-outer-join-in-hadoop-map-reduce.html

Regards,
Shahab


On Mon, Aug 19, 2013 at 9:43 AM, Pavan Sudheendra <pavan0591@gmail.com>wrote:

> I'm basically trying to do a join across 3 tables in the mapper.. In the
> reducer i am doing a group by and writing the output to another table..
>
> Although, i agree that my code is pathetic, what i could actually do is
> create a HTable object once and pass it as an extra argument to the map
> function.. But, would that solve the problem?
>
> Roughly these are my tables and the code flows like this
> Mapper-> Table1 -> Contentidx ->Content -> Mapper aggregates the values ->
> Reducer.
>
>
> Table1 -> 19 Million rows.
> Contentidx table - 150k rows.
> Content table - 93k rows.
>
> Yes, i have looked at the map-reduce example given by the hbase website and
> that is how i am following.
>
>
>
> On Mon, Aug 19, 2013 at 7:05 PM, Shahab Yunus <shahab.yunus@gmail.com
> >wrote:
>
> > Can you please explain or show the flow of the code a bit more? Why are
> you
> > create the HTable object again and again in the mapper? Where is
> > ContentidxTable
> > (the name of the table, I believe?) defined? What is your actually
> > requirement?
> >
> > Also, have you looked into this, the api for wiring HBase tables with M/R
> > jobs?
> > http://hbase.apache.org/book/mapreduce.example.html
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Aug 19, 2013 at 9:05 AM, Pavan Sudheendra <pavan0591@gmail.com
> > >wrote:
> >
> > > Also, the same code works perfectly fine when i run it in single node
> > > cluster. I've added the hbase classpath to HADOOP_CLASSPATH and have
> set
> > > all the other env variables also..
> > >
> > >
> > > On Mon, Aug 19, 2013 at 6:33 PM, Pavan Sudheendra <pavan0591@gmail.com
> > > >wrote:
> > >
> > > > Hi all,
> > > > I'm getting the following error messages everytime i run the
> map-reduce
> > > > job across multiple hadoop clusters:
> > > >
> > > > java.lang.NullPointerException
> > > >     at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:414)
> > > >     at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:170)
> > > > at com.company$AnalyzeMapper.contentidxjoin(MRjobt.java:153)
> > > >
> > > >
> > > > Here's the code:
> > > >
> > > > public void map(ImmutableBytesWritable row, Result columns, Context
> > > > context)
> > > >     throws IOException {
> > > > ...
> > > > ...
> > > > public static String contentidxjoin(String contentId) {
> > > > Configuration conf = HBaseConfiguration.create();
> > > >           HTable table;
> > > >         try {
> > > >             table = new HTable(conf, ContentidxTable);
> > > >             if(table!= null) {
> > > >             Get get1 = new Get(Bytes.toBytes(contentId));
> > > >
> get1.addColumn(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > > Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > >             Result result1 = table.get(get1);
> > > >             byte[] val1 =
> > > > result1.getValue(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > >                   Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > >             if(val1!=null) {
> > > >                 LOGGER.info("Fetched data from BARB-Content table");
> > > >             } else {
> > > >                 LOGGER.error("Error fetching data from BARB-Content
> > > > table");
> > > >             }
> > > >             return_value =
> contentjoin(Bytes.toString(val1),contentId);
> > > >             }
> > > >         }
> > > > catch (Exception e) {
> > > >             LOGGER.error("Error inside contentidxjoin method");
> > > >             e.printStackTrace();
> > > >         }
> > > >         return return_value;
> > > > }
> > > > }
> > > >
> > > > Assume all variables are defined.
> > > >
> > > > Can anyone please tell me why the table never gets instantiated or
> > > > entered? I had set up break points and this function gets called many
> > > times
> > > > while mapper executes.. everytime it says *Error inside
> contentidxjoin
> > > > method*.. I'm 100% sure there are rows in the ContentidxTable so not
> > sure
> > > > why its not able to fetch the value from it..
> > > >
> > > > Please help!
> > > >
> > > >
> > > > --
> > > > Regards-
> > > > Pavan
> > > >
> > >
> > >
> > >
> > > --
> > > Regards-
> > > Pavan
> > >
> >
>
>
>
> --
> Regards-
> Pavan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message