hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From J Mohamed Zahoor <jmo...@gmail.com>
Subject Re: [potential bug]Find rows which do not have any of the given columns
Date Tue, 07 Aug 2012 09:57:26 GMT
Hi

Nice one. But i think this is valid behavior.
Time ranges are something which qualifies certain rows to be made available
to the client (something which is related to MVCC).
Once a certain rows are qualified... then the filters are applied on them.

The fact that both can be set simultaneously on a "Scan" object hints that
they orthogonal.

./zahoor

On Tue, Aug 7, 2012 at 2:10 AM, Shrijeet Paliwal <shrijeet@rocketfuel.com>wrote:

> - user
> +dev
>
> Hi Devs,
>
> Please follow the discussion to get full context. tl:dr "Did a scan with
> timerange and filters, scan o/p was incorrect. Repeated scan with filter
> only, scan o/p was correct."
>
> HBase version : 0.90.3
> Hadoop : CDH3u0
> Issues:
> The scan when set with both a time range and a filter can behave in
> an unintuitive way. Calling it unintuitive instead of wrong, since I do not
> know if this is a known limitation of scan. Picture a filter setup like
> mine - "Filter rows which have cells pertaining to certain columns". This
> filter is set on a scan which has a time range constraint as well.  AFAIK
> we skip Hfiles based on metadata when dealing with time ranges. If a region
> has two Hfiles. One of the Hfiles has cells for unwanted columns but the
> other one does not - we may get incorrect result based on what how time
> range is set (If the time range scan optimizer skips the Hfile containing
> unwanted cells).
>
> Does this sound like a valid issue? Also I can see this happening to more
> than one kind of SkipFilters.
>
> -Shrijeet
>
>
> On Mon, Aug 6, 2012 at 11:38 AM, Shrijeet Paliwal
> <shrijeet@rocketfuel.com>wrote:
>
> > It seems setting time range is a problem , I was doing  (*
> > scan.setTimeRange(Long.**valueOf(args[4]), Long.valueOf(args[5]));)*
> > *
> > *
> > I was working on assumption that filter logic works before scan logic, in
> > other words a KV dropped by filter will not make it to scan. In case of
> > time range this might not be true.
> >
> > -Shrijeet
> >
> >
> > On Mon, Aug 6, 2012 at 9:25 AM, jmozah <jmozah@gmail.com> wrote:
> >
> >> Hmmm.. Missed it. Otherwise i dont spot anything wrong in this.
> >> are you sure about the column names?
> >>
> >> ./zahoor
> >>
> >>
> >> On 06-Aug-2012, at 9:34 PM, Shrijeet Paliwal <shrijeet@rocketfuel.com>
> >> wrote:
> >>
> >> > I am using FilterList. Could you elaborate?
> >> >
> >> > On Mon, Aug 6, 2012 at 8:48 AM, jmozah <jmozah@gmail.com> wrote:
> >> >
> >> >>
> >> >>
> >> >> Use FilterList instead of List of Filters.
> >> >>
> >> >> ./Zahoor
> >> >>
> >> >> On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <
> shrijeet@rocketfuel.com
> >> >
> >> >> wrote:
> >> >>
> >> >>> Hi All,
> >> >>>
> >> >>> I am writing a job which finds rows that do not have a cell
> >> corresponding
> >> >>> to any of the columns in the given set of columns.
> >> >>> This is how I have configured my scan (a combination of
> >> lQualifierFilters
> >> >>> and SkipFilter)
> >> >>>
> >> >>>   columnsSet = Splitter.on(',') .split(columns); //columns is a
csv
> >> >>> containing column names
> >> >>>   List<Filter> qualifierFilters = new ArrayList<Filter>();
> >> >>>   for (String qual : columnsSet) {
> >> >>>     qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL,
> >> >>>         new BinaryComparator(Bytes.toBytes(qual))));
> >> >>>   }
> >> >>>   Filter skipFilter = new SkipFilter(new
> >> >>> FilterList(Operator.MUST_PASS_ALL, qualifierFilters));
> >> >>>   Scan scan = new Scan();
> >> >>>   scan.addFamily(Bytes.toBytes(family));
> >> >>>   scan.setCacheBlocks(false);
> >> >>>   scan.setCaching(1000);
> >> >>>   scan.setFilter(skipFilter);
> >> >>>   scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5]));
> >> >>>
> >> >>> In my test table the scan worked as expected. But in production
> run, I
> >> >> got
> >> >>> rows which had cells containing one of the given qualifiers (not
> >> >> expected)
> >> >>> Can some one help me spot the mistake?
> >> >>>
> >> >>> -Shrijeet
> >> >>
> >> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message