Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of octo47@gmail.com designates
 209.85.161.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=mb0F6GuXR67VuqyB8CiVPxfZ206cRCXBX6mviuZ4osDRjaaPNsHsGG1zHVDuSzwoHu
         faSEXI6JgKpcPFSmU678pMIhcrdsyiZXfqqqDk2y/KGkF6ImlXRpdaQFsXSEi9yRG0bv
         V6543P6J9CYxRSTGyaQNJOoSkbJpUEc/1Kmbc=
MIME-Version: 1.0
In-Reply-To: <AANLkTikAtaPcO1OL09g-t1mTh+0SqoSX2++1gRYud38=@mail.gmail.com>
References: <AANLkTimo9SKbhuvSMfLNx+j2=+xuDdmkPmHT99_b+18n@mail.gmail.com>
 <AANLkTi=xS8BvUvmMoPMdo_vLZwt3uDFjGQMBG5dO0a6P@mail.gmail.com>
 <AANLkTinv1Du8J+x4Jha+cRDakJgfjfxqD0K=A+PkrRjS@mail.gmail.com>
 <AANLkTi=TWS1zXQ+Nhp_3tmqOejuPt=4jvaHLopoqpD+M@mail.gmail.com>
 <AANLkTi=yjP-sUP2yFfpUhFKdGVAfa82FQTKEzkNYeHXb@mail.gmail.com>
 <AANLkTikAtaPcO1OL09g-t1mTh+0SqoSX2++1gRYud38=@mail.gmail.com>
From: Andrey Stepachev <octo47@gmail.com>
Date: Sun, 9 Jan 2011 00:59:19 +0300
Message-ID: <AANLkTi=XvLUeA9iNcumd4EbjpGzQhGtr7NnEL8PfjrMG@mail.gmail.com>
Subject: Re: question about merge-join (or AND operator betwween colums)
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001636c5a860f26a3004995cd9f0

--001636c5a860f26a3004995cd9f0
Content-Type: text/plain; charset=UTF-8

More details on binary sorting you can read
http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/

2011/1/8 Jack Levin <magnito@gmail.com>

> Basic problem described:
>
> user uploads 1 image and creates some text -10 days ago, then creates 1000
> text messages on between 9 days ago and today:
>
>
> row key          | fm:type --> value
>
>
> 00days:uid     | type:text --> text_id
>
> .
>
> .
>
> 09days:uid | type:text --> text_id
>
>
> 10days:uid     | type:photo --> URL
>
>          | type:text --> text_id
>
>
> Skip all the way to 10days:uid row, without reading 00days:id - 09:uid
> rows.
>  Ideally we do not want to read all 1000 entries that have _only_ text.  We
> want to get to last entry in the most efficient way possible.
>
>
> -Jack
>
>
>
>
> On Sat, Jan 8, 2011 at 11:43 AM, Stack <stack@duboce.net> wrote:
> > Strike that.  This is a Scan, so can't do blooms + filter.  Sorry.
> > Sounds like a coprocessor then.  You'd have your query 'lean' on the
> > column that you know has the lesser items and then per item, you'd do
> > a get inside the coprocessor against the column of many entries.  The
> > get would go via blooms.
> >
> > St.Ack
> >
> >
> > On Sat, Jan 8, 2011 at 11:39 AM, Stack <stack@duboce.net> wrote:
> >> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin <magnito@gmail.com> wrote:
> >>> Yes, we thought about using filters, the issue is, if one family
> >>> column has 1ml values, and second family column has 10 values at the
> >>> bottom, we would end up scanning and filtering 99990 records and
> >>> throwing them away, which seems inefficient.
> >>
> >> Blooms+filters?
> >> St.Ack
> >>
> >
>

--001636c5a860f26a3004995cd9f0--