hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Paz Vital <pvi...@linux.vnet.ibm.com>
Subject Re: Wrong HBase Sort Order with Pig
Date Fri, 13 Sep 2013 17:51:56 GMT
John, yeah the first option looks better.
Glad you solve the problem

Best regards, Paulo Vital

On Fri, 2013-09-13 at 19:11 +0200, John wrote:
> Hi, thanks for your answer. I solved the problem. Here is the answer from
> another mailing list:
> 
> The problem is that HBaseStorage maps
> columns families into a HashMap, so the sort ordering is completely lost.
> 
> You have two options:
> 
> 1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap) and
> use the modified HBaseStorage. (or make it configurable)
> 2. Since you convert the map to a bag, you can sort the bag in a nested
> foreach statement.
> 
> I prefer option 1 myself because it would be more performant than option 2.
> 
> Thanks anyway!
> 
> 
> 2013/9/13 Paulo Ricardo Paz Vital <pvital@linux.vnet.ibm.com>
> 
> > Hello John,
> >
> > Are you running HBase and Pig with IBM Java?
> >
> > We found an error in one Pig unit test when building with IBM Java and
> > looks like the problem is the same you are reporting. Please, check the
> > JIRA [1] that's explaining the problem in Pig and the solution there.
> >
> > [1] https://issues.apache.org/jira/browse/PIG-3309
> >
> > If the error is the same and you are using IBM Java, the problem is how
> > HashMap implementation of IBM order the map - it's different from
> > Oracle's (Sun) implementation.
> >
> > Best regards,
> > Paulo Vital
> >
> > On Fri, 2013-09-13 at 16:38 +0200, John wrote:
> > > Hi, I already ask this on the pig mailing list. But because I'm not sure
> > if
> > > it is a Pig or HBase issue, I will ask here too since the Pig Function is
> > > using a hbae scan operation. Here is my Questions:
> > >
> > > I have created a HBase Table in the hbase shell and added some data. In
> > > http://hbase.apache.org/book/dm.sort.html is written that the datasets
> > are
> > > first sorted by the rowkey and then the column. So I tried something in
> > the
> > > HBase Shell: http://pastebin.com/gLVAX0rJ
> > >
> > > Everything looks fine. I got the right order a -> c -> d like expected.
> > >
> > > Now I tried the same with Apache Pig in Java:
> > http://pastebin.com/jdTpj4Fu
> > >
> > > I got this result:
> > >
> > > (key1,[c#val,d#val,a#val])
> > >
> > > So, now the order is c -> d -> a. That seems a little odd to me,
> > shouldn't
> > > it be the same like in HBase? It's important for me to get the right
> > order
> > > because I transform the map afterwards into a bag and then join it with
> > > other tables. If both inputs are sorted I could use a merge join without
> > > sorting these two datasets. So does anyone know how it is possible to get
> > > the sorted map (or bag) of the columns?
> > >
> > >
> > > thanks
> >
> > --
> > Paulo Ricardo Paz Vital <pvital@linux.vnet.ibm.com>
> > IBM Linux Technology Center
> >
> >



Mime
View raw message