Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of dalia.mohsobhy@hotmail.com
 designates 157.55.1.143 as permitted sender)
Message-ID: <DUB114-W94AD28FA3CED13EE8E68D6853A0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_835d6147-4d58-4b66-b95f-70f418661be8_"
From: Dalia Sobhy <dalia.mohsobhy@hotmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Subject: RE: Hbase Count Aggregate Function
Date: Tue, 25 Dec 2012 18:42:52 +0200
Importance: Normal
In-Reply-To: <DUB114-W1233AE95DCF983747460079853B0@phx.gbl>
References: 
 <DUB114-W2876C970434C8F55871C8853B0@phx.gbl>,<CAAT7MkrufJT9vJirSmLQDfB2eyzBWt9+8yFCDxTd7axun08WOQ@mail.gmail.com>,<DUB114-W80762CA4C0EA6BC0D1DA0F853B0@phx.gbl>,<CAAT7Mkp6BLA4gM-dzMSn+Dnau6HF5eYJH1cuyUD+hrQx-yugeQ@mail.gmail.com>,<DUB114-W974D1E8D1A1E6D979E3511853B0@phx.gbl>,<CAAT7MkpPYoXaAJ1GvA59D1h+YgsuOwUWno3LXAwvy6sRkHj=Dw@mail.gmail.com>,<DUB114-W1233AE95DCF983747460079853B0@phx.gbl>
MIME-Version: 1.0

--_835d6147-4d58-4b66-b95f-70f418661be8_
Content-Type: text/plain; charset="windows-1256"
Content-Transfer-Encoding: 8bit


Do you mean I implement a new rowCount method in Aggregation Client Class.

I cannot understand, could u illustrate with a code sample Ram?

> > Date: Tue, 25 Dec 2012 00:21:14 +0530
> > Subject: Re: Hbase Count Aggregate Function
> > From: ramkrishna.s.vasudevan@gmail.com
> > To: user@hbase.apache.org
> > 
> > Hi
> > You could have custom filter implemented which is similar to
> > FirstKeyOnlyfilter.
> > Implement the filterKeyValue method such that it should match your keyvalue
> > (the specific qualifier that you are looking for).
> > 
> > Deploy it in your cluster.  It should work.
> > 
> > Regards
> > Ram
> > 
> > On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <dalia.mohsobhy@hotmail.com>wrote:
> > 
> > >
> > > So do you have a suggestion how to enable/work the filter?
> > >
> > > > Date: Mon, 24 Dec 2012 22:22:49 +0530
> > > > Subject: Re: Hbase Count Aggregate Function
> > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > Okie, seeing the shell script and the code I feel that while you use this
> > > > counter, the user's filter is not taken into account.
> > > > It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com>wrote:
> > > >
> > > > >
> > > > > yeah scan gives the correct number of rows, while count returns the
> > > total
> > > > > number of rows.
> > > > >
> > > > > Both are using the same filter, I even tried it using Java API, using
> > > row
> > > > > count method.
> > > > >
> > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> > > > >
> > > > > I get the total number of rows not the number of rows filtered.
> > > > >
> > > > > So any idea ??
> > > > >
> > > > > Thanks Ram :)
> > > > >
> > > > > > Date: Mon, 24 Dec 2012 21:57:54 +0530
> > > > > > Subject: Re: Hbase Count Aggregate Function
> > > > > > From: ramkrishna.s.vasudevan@gmail.com
> > > > > > To: user@hbase.apache.org
> > > > > >
> > > > > > So you find that scan with a filter and count with the same filter is
> > > > > > giving you different results?
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
> > > dalia.mohsobhy@hotmail.com
> > > > > >wrote:
> > > > > >
> > > > > > >
> > > > > > > Dear all,
> > > > > > >
> > > > > > > I have 50,000 row with diagnosis qualifier = "cardiac", and another
> > > > > 50,000
> > > > > > > rows with "renal".
> > > > > > >
> > > > > > > When I type this in Hbase shell,
> > > > > > >
> > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > > >
> > > > > > > scan 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > > >          Bytes.toBytes('diagnosis'),
> > > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > >
> > > > > > > Output = 50,000 row
> > > > > > >
> > > > > > > import org.apache.hadoop.hbase.filter.CompareFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
> > > > > > > import org.apache.hadoop.hbase.filter.SubstringComparator
> > > > > > > import org.apache.hadoop.hbase.util.Bytes
> > > > > > >
> > > > > > > count 'patient', { COLUMNS => "info:diagnosis", FILTER =>
> > > > > > >     SingleColumnValueFilter.new(Bytes.toBytes('info'),
> > > > > > >          Bytes.toBytes('diagnosis'),
> > > > > > >          CompareFilter::CompareOp.valueOf('EQUAL'),
> > > > > > >          SubstringComparator.new('cardiac'))}
> > > > > > > Output = 100,000 row
> > > > > > >
> > > > > > > Even though I tried it using Hbase Java API, Aggregation Client
> > > > > Instance,
> > > > > > > and I enabled the Coprocessor aggregation for the table.
> > > > > > > rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
> > > > > > >
> > > > > > > Also when measuring the improved performance on case of adding more
> > > > > nodes
> > > > > > > the operation takes the same time.
> > > > > > >
> > > > > > > So any advice please?
> > > > > > >
> > > > > > > I have been throughout all this mess from a couple of weeks
> > > > > > >
> > > > > > > Thanks,
> > > > >
> > > > >
> > >
> > >
>  		 	   		  
 		 	   		  
--_835d6147-4d58-4b66-b95f-70f418661be8_--