incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Possible to have a META shard?
Date Mon, 14 Jul 2014 06:31:24 GMT
Well,

we don't have any need for a row-query in our model. All queries return
individual records...

Ex: We always assume RowId=userId. So we are only interested in getting
records for a matching row-id/user-id.

In terms of SQL, it will always be "SELECT * from .... WHERE... AND
RowId=<XYZ> LIMIT N"

Forming a row-query based scoring should also be possible no? If I
remember, I had submitted a very rough draft of row-query scoring in
https://issues.apache.org/jira/browse/BLUR-290 [RowDocsCollector,
BlurRowCodec etc...]

Do you think such a Codec based approach will work for row-queries?

--
Ravi


On Fri, Jul 11, 2014 at 5:58 PM, Tim Williams <williamstw@gmail.com> wrote:

> On Thu, Jul 10, 2014 at 1:00 PM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > Aaron,
> >
> > This is a lengthy post. Please bear...
> >
> > We are looking at Blur slightly differently. No Map-Red ops, No immutable
> > RowId data etc... Just plain online-search like regular lucene/SOLR/ES
> >
> > Our use-case mandates that Documents for a RowId will arrive
> incrementally.
> > We don't have the luxury of dropping the whole-row and re-indexing it,
> as a
> > given Row will have hundreds of thousands of docs...
> >
> > A single row-id will always be found in one shard, but spread across
> > segments. We have modified blur sources on both indexing/search side to
> > support this requirement
> >
> > In other words, we support ADD_RECORDS thrift-op to an existing Row..
> >
> > We actually are now testing a sharding strategy similar to databases in
> Blur
> >
> > 1. Initially we start with lets say 300 shards per table aka base-shards
> > 2. Each shard has a fixed size lets say 16 GB. Client will watch for this
> >     and spawn a new shard when size exceeds. {An alias-shard in ES terms}
> > 3. ZK will hold the Base --> List-of-Alias shards
> > 4. A RowId will be allocated a shard that has least number of alias
> shards.
> >     This mapping will never change in the lifetime of a Row
> > 5. ADD_RECORDS op will go the latest alias, while DEL/UPDATE will go to
> >     all aliases+base shards.
> > 6. Once all 300 base-shards have spawned aliases, admins can create new
> >     base shards on the cluster. Newer RowIds will auto-allocate to
> freshly
> >     created shards
> > 7. Both horizontal & vertical scaling of shards can be supported easily
> by
> >     this approach
> >
> > Now all these are possible only if the RowId -> Base-Shard mapping is
> > maintained externally.
>
> Hi Ravi,
> Can you explain how searching across a records in a row works in this
> case?  For example, the row query example in the docs[1]?
>
> Thanks,
> --tim
>
> [1] -
> http://incubator.apache.org/blur/docs/0.2.2/data-model.html#row_query
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message