hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Hardy <dha...@viadeoteam.com>
Subject Re: scan filtering column familly return wrong cell
Date Mon, 12 Nov 2012 09:38:24 GMT
I don't know if HBase shell scan command use ColumnCountGetFilter.
The absence of compaction could explain the 2 same cell displayed.
But when I filter on one colfam, I get only 1 cell ... from the wrong
colfam (like if the cell is stored in the wrong HFile) ...

When I add clone of my KeyValues in my Put in reduce the data is well
writen (I get my 2 colfam filled).

It sound strange that client mapReduce can set such a mess in the storage...

Regards,

-- 
Damien

2012/11/11 Varun Sharma <varun@pinterest.com>

> I have not look at this in detail but does this eventually use the
> ColumnCountGetFilter - if yes, then this will actually also include upto
> one older version since filters run before version tracking - see JIRA
> https://issues.apache.org/jira/browse/HBASE-5257 which has a fix -
> Remember
> that versions are always kept in memstore and only cleaned up when memstore
> is flushed out as an HFile.
>
> On Fri, Nov 9, 2012 at 8:52 AM, Damien Hardy <dhardy@viadeoteam.com>
> wrote:
>
> > Ok I can reply to myself ...
> >
> > you have to add a clone of the KeyValue in the Put. So
> >   p.add(kv);
> > becomes
> >   p.add(kv.clone());
> >
> > If not, I suppose only the last one is added in HBase (but the result is
> > quite weird and should be fixed IMO)
> >
> > Cheers,
> >
> > --
> > Damien
> >
> >
> > 2012/11/9 Damien Hardy <dhardy@viadeoteam.com>
> >
> > > Hello,
> > >
> > > I am a bit confused here...
> > >
> > > I try to execute a M/R to import data in HBase table 'Consultation'.
> > >
> > > Running on CDH4.1.2
> > >
> > > map function emits context.write(ImmutableBytesWritable, KeyValue)
> > >
> > > conf summary :
> > >     job.setOutputFormatClass(TableOutputFormat.class);
> > >     job.setInputFormatClass(DataDrivenDBInputFormat.class);
> > >     job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
> > > "Consultation");
> > >     job.setOutputKeyClass(ImmutableBytesWritable.class);
> > >     job.setOutputValueClass(KeyValue.class);
> > >
> > >
> > > The reduce class is :
> > >
> > >   static class ImportReducer
> > >   extends TableReducer<ImmutableBytesWritable, KeyValue,
> > > ImmutableBytesWritable> {
> > >     @Override
> > >     public void reduce(ImmutableBytesWritable row, Iterable<KeyValue>
> > kvs,
> > > Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable,
> > > Writable>.Context context)
> > >     throws java.io.IOException, InterruptedException {
> > >       Put p = new Put(row.copyBytes());
> > >       int i = 0;
> > >       byte[] rk = null;
> > >       for (KeyValue kv: kvs) {
> > >         p.add(kv);
> > >         if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length,
> > > kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) == 0 ) {
> > >           i++;
> > >         }
> > >       }
> > >       p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i));
> > >       context.write(new ImmutableBytesWritable(row),p);
> > >     }
> > >   }
> > >
> > >
> > > hbase(main):038:0> scan 'Consultation', {COLUMNS=> *'visiting_tl'*,
> LIMIT
> > > => 10 }
> > > ROW
> > > COLUMN+CELL
> > >
> > >  00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7,
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001316263fc8b454bbd86dff1587a347-\x00>t\x05               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0,
> > > timestamp=1275341540000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001497e68d7c71a3cd281860484fa6be-\x00/\x0E^               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S,
> > > timestamp=1271199453000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5         column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po,
> > > timestamp=1277069546000,
> > > value=\x00\x00\x00\x01
> > >
> > >  0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97            column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?.,
> > > timestamp=1267119748000,
> > > value=\x00\x00\x00\x00
> > >
> > >  001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9,
> > > timestamp=1276070291000,
> > > value=\x00\x00\x00\x01
> > >
> > >  00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08            column=*
> > > visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19,
> > > timestamp=1267365866000,
> > > value=\x00\x00\x00\x00
> > >
> > >  0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA            column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B,
> > > timestamp=1277198390000,
> > > value=\x00\x00\x00\x02
> > >
> > >  00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q,
> > > timestamp=1276745232000,
> > > value=\x00\x00\x00\x01
> > >
> > >  0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93               column=*
> > > visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09,
> > > timestamp=1272636066000,
> > > value=\x00\x00\x00\x01
> > >
> > > 10 row(s) in 2.1130 seconds
> > >
> > >
> > > hbase(main):036:0> get  'Consultation',
> > > "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15"
> > > COLUMN
> > > CELL
> > >
> > >  *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > >  *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7*
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > >  visits_count:_counter
> > > timestamp=1352475456545,
> > > value=\x00\x00\x02\xA1
> > >
> > > 3 row(s) in 0.3260 seconds
> > >
> > > hbase(main):037:0> get  'Consultation',
> > > "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'*
> > > COLUMN
> > > CELL
> > >
> > >  *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7
> > > timestamp=1266998781000,
> > > value=\x00\x00\x00\x00
> > >
> > > 1 row(s) in 0.1650 seconds
> > >
> > > So I have 3 problems :
> > >
> > >  * table is only 1 VERSION enable : who can I get the cell
> > > visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a
> > > single row ?
> > >  * when I explicitly query for CF 'visiting_tl:' , I get a
> 'visited_tl:'
> > > cell ... WTF ?
> > >  * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673
> is
> > > the good value according to my source)
> > >
> > > Cheers,
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message