hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Java Null Pointer Exception!
Date Thu, 22 Aug 2013 15:19:18 GMT
Uhmm... not exactly.  It depends on how you view HBase and your use case...

The short answer is that Sudheendra is basically correct, you really need to rethink using
HBase if you're doing a lot of joins because HBase is more of a persistent object store and
not a relational database.  The longer answer is that even though HBase lacks the internals
to handle JOINS effectively, it can be made to do joins.  


Ok.... you have to remember that JOINS are expensive.  If you don't have indexes, its going
to be a map/reduce problem. 
If you have indexes... you can join against them by comparing the ordered sets and taking
the intersections.  Using inverted tables and then a FK idx table. 

There are some issues that you have to work around... 

1) A row can't exceed the size of a region.  So you will need to work out how to split a row
while still maintaining sort order. 
2) You will probably want to launch the query from an Edge node (some call it a gateway node)
which is on the same subnet as your cluster.
3) Such a solution is going to work when you want fast read , but a slower write. 
4) Coprocessors need to be tweaked a bit and you would want to decouple the writes to the
secondary index tables from the base table write. 
5) If you rely on your Hadoop Vendor to auto tune your cluster... you will have to make some
manual tweaks.
6) This is not for the beginner or faint of heart. 

But yes, in a nutshell, it can be done. 

Also a side note. If you want to use Lucene as your secondary index, you could do it... but
haven't thought through that problem yet... 


On a different side note:
This is why the current model of indexes may work ok for limiting results against a single
table... it won't work well against tables when you want to do joins. 
(And you will want to do joins in HBase eventually....)  

HTH

-Mike


On Aug 19, 2013, at 9:11 AM, Shahab Yunus <shahab.yunus@gmail.com> wrote:

> I think you should not try to join the tables this way. It will be against
> the recommended design/pattern of HBase (joins in HBase alone go against
> the design) and M/R. You should first, maybe through another M/R job or PIg
> script, for example, pre-process data and massage it into a uniform or
> appropriate structure conforming to the M/R architecture (maybe convert
> them into ext files first?) Have you looked into the recommended M/R join
> strategies?
> 
> Some links to start with:
> 
> http://codingjunkie.net/mapreduce-reduce-joins/
> http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/
> http://blog.matthewrathbone.com/2013/02/09/real-world-hadoop-implementing-a-left-outer-join-in-hadoop-map-reduce.html
> 
> Regards,
> Shahab
> 
> 
> On Mon, Aug 19, 2013 at 9:43 AM, Pavan Sudheendra <pavan0591@gmail.com>wrote:
> 
>> I'm basically trying to do a join across 3 tables in the mapper.. In the
>> reducer i am doing a group by and writing the output to another table..
>> 
>> Although, i agree that my code is pathetic, what i could actually do is
>> create a HTable object once and pass it as an extra argument to the map
>> function.. But, would that solve the problem?
>> 
>> Roughly these are my tables and the code flows like this
>> Mapper-> Table1 -> Contentidx ->Content -> Mapper aggregates the values
->
>> Reducer.
>> 
>> 
>> Table1 -> 19 Million rows.
>> Contentidx table - 150k rows.
>> Content table - 93k rows.
>> 
>> Yes, i have looked at the map-reduce example given by the hbase website and
>> that is how i am following.
>> 
>> 
>> 
>> On Mon, Aug 19, 2013 at 7:05 PM, Shahab Yunus <shahab.yunus@gmail.com
>>> wrote:
>> 
>>> Can you please explain or show the flow of the code a bit more? Why are
>> you
>>> create the HTable object again and again in the mapper? Where is
>>> ContentidxTable
>>> (the name of the table, I believe?) defined? What is your actually
>>> requirement?
>>> 
>>> Also, have you looked into this, the api for wiring HBase tables with M/R
>>> jobs?
>>> http://hbase.apache.org/book/mapreduce.example.html
>>> 
>>> Regards,
>>> Shahab
>>> 
>>> 
>>> On Mon, Aug 19, 2013 at 9:05 AM, Pavan Sudheendra <pavan0591@gmail.com
>>>> wrote:
>>> 
>>>> Also, the same code works perfectly fine when i run it in single node
>>>> cluster. I've added the hbase classpath to HADOOP_CLASSPATH and have
>> set
>>>> all the other env variables also..
>>>> 
>>>> 
>>>> On Mon, Aug 19, 2013 at 6:33 PM, Pavan Sudheendra <pavan0591@gmail.com
>>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> I'm getting the following error messages everytime i run the
>> map-reduce
>>>>> job across multiple hadoop clusters:
>>>>> 
>>>>> java.lang.NullPointerException
>>>>>    at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:414)
>>>>>    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:170)
>>>>> at com.company$AnalyzeMapper.contentidxjoin(MRjobt.java:153)
>>>>> 
>>>>> 
>>>>> Here's the code:
>>>>> 
>>>>> public void map(ImmutableBytesWritable row, Result columns, Context
>>>>> context)
>>>>>    throws IOException {
>>>>> ...
>>>>> ...
>>>>> public static String contentidxjoin(String contentId) {
>>>>> Configuration conf = HBaseConfiguration.create();
>>>>>          HTable table;
>>>>>        try {
>>>>>            table = new HTable(conf, ContentidxTable);
>>>>>            if(table!= null) {
>>>>>            Get get1 = new Get(Bytes.toBytes(contentId));
>>>>> 
>> get1.addColumn(Bytes.toBytes(ContentidxTable_ColumnFamily),
>>>>> Bytes.toBytes(ContentidxTable_ColumnQualifier));
>>>>>            Result result1 = table.get(get1);
>>>>>            byte[] val1 =
>>>>> result1.getValue(Bytes.toBytes(ContentidxTable_ColumnFamily),
>>>>>                  Bytes.toBytes(ContentidxTable_ColumnQualifier));
>>>>>            if(val1!=null) {
>>>>>                LOGGER.info("Fetched data from BARB-Content table");
>>>>>            } else {
>>>>>                LOGGER.error("Error fetching data from BARB-Content
>>>>> table");
>>>>>            }
>>>>>            return_value =
>> contentjoin(Bytes.toString(val1),contentId);
>>>>>            }
>>>>>        }
>>>>> catch (Exception e) {
>>>>>            LOGGER.error("Error inside contentidxjoin method");
>>>>>            e.printStackTrace();
>>>>>        }
>>>>>        return return_value;
>>>>> }
>>>>> }
>>>>> 
>>>>> Assume all variables are defined.
>>>>> 
>>>>> Can anyone please tell me why the table never gets instantiated or
>>>>> entered? I had set up break points and this function gets called many
>>>> times
>>>>> while mapper executes.. everytime it says *Error inside
>> contentidxjoin
>>>>> method*.. I'm 100% sure there are rows in the ContentidxTable so not
>>> sure
>>>>> why its not able to fetch the value from it..
>>>>> 
>>>>> Please help!
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards-
>>>>> Pavan
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards-
>>>> Pavan
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Regards-
>> Pavan
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message