hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavan Sudheendra <pavan0...@gmail.com>
Subject Re: Java Null Pointer Exception!
Date Thu, 22 Aug 2013 14:45:16 GMT
How much time would you think the MR application will take for processing
19 million records in 1 table and 4.5 million records in another table?


On Tue, Aug 20, 2013 at 1:33 AM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> Theoretically it is possible but it goes against the design of the HBase
> and M/R architecture. And when I say 'goes against', it does not mean that
> it is impossible but it means that you can face extreme performance
> degradation, difficulty in maintaining and flexibility of the system and
> poor robustness...issues applicable when your misuse or use some
> concept/architecture/framework/paradigm/tool incorrectly.
>
> Coming to the question of using M/R, I am confused that where exactly your
> supervisor wants you to use M/R? In the whole project? Anywhere in the
> project? Do you have to must join HBase tables or use M/R to join HBase
> tables (which would be quite surprising)? Because as I said earlier, you
> can break your high-level application/system in to a chain of dependent M/R
> jobs, where one job(s) feeds the other with the data. E.g. the first job(s)
> read data from HBase, perform some transformation and persist it in HFS in
> flat files. Then your second job reads those and applies more logic to it,
> possible joining it with another set of data available. Here I am just
> giving you an idea that there are many options to break-down your system
> into smaller chunks still using HBase and M/R. It all depends on your
> requirements and then accordingly designing your set of jobs (application).
> This might require some creative thinking at your part.
>
> These are just my 2 cents.
>
> Regards,
> Shahab
>
>
> On Mon, Aug 19, 2013 at 10:22 AM, Pavan Sudheendra <pavan0591@gmail.com
> >wrote:
>
> > But there's a lot of processing happening with the table data before sent
> > over to the reducer.. Theoretically speaking, it should be possible..
> >
> > Our supervisor strictly wants a mr application to do this..
> >
> > Do you want to see more code? I'm just baffled as to why it's giving null
> > pointer when there is data clearly.
> >
> > Regards,
> > Pavan
> > On Aug 19, 2013 7:41 PM, "Shahab Yunus" <shahab.yunus@gmail.com> wrote:
> >
> > > I think you should not try to join the tables this way. It will be
> > against
> > > the recommended design/pattern of HBase (joins in HBase alone go
> against
> > > the design) and M/R. You should first, maybe through another M/R job or
> > PIg
> > > script, for example, pre-process data and massage it into a uniform or
> > > appropriate structure conforming to the M/R architecture (maybe convert
> > > them into ext files first?) Have you looked into the recommended M/R
> join
> > > strategies?
> > >
> > > Some links to start with:
> > >
> > > http://codingjunkie.net/mapreduce-reduce-joins/
> > > http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/
> > >
> > >
> >
> http://blog.matthewrathbone.com/2013/02/09/real-world-hadoop-implementing-a-left-outer-join-in-hadoop-map-reduce.html
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Mon, Aug 19, 2013 at 9:43 AM, Pavan Sudheendra <pavan0591@gmail.com
> > > >wrote:
> > >
> > > > I'm basically trying to do a join across 3 tables in the mapper.. In
> > the
> > > > reducer i am doing a group by and writing the output to another
> table..
> > > >
> > > > Although, i agree that my code is pathetic, what i could actually do
> is
> > > > create a HTable object once and pass it as an extra argument to the
> map
> > > > function.. But, would that solve the problem?
> > > >
> > > > Roughly these are my tables and the code flows like this
> > > > Mapper-> Table1 -> Contentidx ->Content -> Mapper aggregates
the
> values
> > > ->
> > > > Reducer.
> > > >
> > > >
> > > > Table1 -> 19 Million rows.
> > > > Contentidx table - 150k rows.
> > > > Content table - 93k rows.
> > > >
> > > > Yes, i have looked at the map-reduce example given by the hbase
> website
> > > and
> > > > that is how i am following.
> > > >
> > > >
> > > >
> > > > On Mon, Aug 19, 2013 at 7:05 PM, Shahab Yunus <
> shahab.yunus@gmail.com
> > > > >wrote:
> > > >
> > > > > Can you please explain or show the flow of the code a bit more? Why
> > are
> > > > you
> > > > > create the HTable object again and again in the mapper? Where is
> > > > > ContentidxTable
> > > > > (the name of the table, I believe?) defined? What is your actually
> > > > > requirement?
> > > > >
> > > > > Also, have you looked into this, the api for wiring HBase tables
> with
> > > M/R
> > > > > jobs?
> > > > > http://hbase.apache.org/book/mapreduce.example.html
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Mon, Aug 19, 2013 at 9:05 AM, Pavan Sudheendra <
> > pavan0591@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Also, the same code works perfectly fine when i run it in single
> > node
> > > > > > cluster. I've added the hbase classpath to HADOOP_CLASSPATH
and
> > have
> > > > set
> > > > > > all the other env variables also..
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 19, 2013 at 6:33 PM, Pavan Sudheendra <
> > > pavan0591@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > > I'm getting the following error messages everytime i run
the
> > > > map-reduce
> > > > > > > job across multiple hadoop clusters:
> > > > > > >
> > > > > > > java.lang.NullPointerException
> > > > > > >     at
> org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:414)
> > > > > > >     at
> > > org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:170)
> > > > > > > at com.company$AnalyzeMapper.contentidxjoin(MRjobt.java:153)
> > > > > > >
> > > > > > >
> > > > > > > Here's the code:
> > > > > > >
> > > > > > > public void map(ImmutableBytesWritable row, Result columns,
> > Context
> > > > > > > context)
> > > > > > >     throws IOException {
> > > > > > > ...
> > > > > > > ...
> > > > > > > public static String contentidxjoin(String contentId) {
> > > > > > > Configuration conf = HBaseConfiguration.create();
> > > > > > >           HTable table;
> > > > > > >         try {
> > > > > > >             table = new HTable(conf, ContentidxTable);
> > > > > > >             if(table!= null) {
> > > > > > >             Get get1 = new Get(Bytes.toBytes(contentId));
> > > > > > >
> > > > get1.addColumn(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > > > > > Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > > > > >             Result result1 = table.get(get1);
> > > > > > >             byte[] val1 =
> > > > > > > result1.getValue(Bytes.toBytes(ContentidxTable_ColumnFamily),
> > > > > > >
> > Bytes.toBytes(ContentidxTable_ColumnQualifier));
> > > > > > >             if(val1!=null) {
> > > > > > >                 LOGGER.info("Fetched data from BARB-Content
> > > table");
> > > > > > >             } else {
> > > > > > >                 LOGGER.error("Error fetching data from
> > BARB-Content
> > > > > > > table");
> > > > > > >             }
> > > > > > >             return_value =
> > > > contentjoin(Bytes.toString(val1),contentId);
> > > > > > >             }
> > > > > > >         }
> > > > > > > catch (Exception e) {
> > > > > > >             LOGGER.error("Error inside contentidxjoin method");
> > > > > > >             e.printStackTrace();
> > > > > > >         }
> > > > > > >         return return_value;
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > Assume all variables are defined.
> > > > > > >
> > > > > > > Can anyone please tell me why the table never gets instantiated
> > or
> > > > > > > entered? I had set up break points and this function gets
> called
> > > many
> > > > > > times
> > > > > > > while mapper executes.. everytime it says *Error inside
> > > > contentidxjoin
> > > > > > > method*.. I'm 100% sure there are rows in the ContentidxTable
> so
> > > not
> > > > > sure
> > > > > > > why its not able to fetch the value from it..
> > > > > > >
> > > > > > > Please help!
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards-
> > > > > > > Pavan
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards-
> > > > > > Pavan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards-
> > > > Pavan
> > > >
> > >
> >
>



-- 
Regards-
Pavan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message