incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reworking the data model
Date Sun, 13 Oct 2013 13:52:42 GMT
On Sun, Oct 13, 2013 at 9:45 AM, Tim Williams <williamstw@gmail.com> wrote:

> On Sun, Oct 13, 2013 at 8:19 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> > I had another thought yesterday that might be even simpler while being
> able
> > to maintain all current features.
> >
> > Instead of having:
> >
> > Row with rowId (DocumentCollection with docCollectionId)
> > Record with recordId (Document with docId)
> >    - Dropping Family
> > Column with name and value (Field with name and value)
> >
> > We drop Row/DocumentCollection altogether and we don't require docId to
> be
> > unique.
> >
> > So it would be:
> >
> > Document with docId
> > Field with name and value
> >
> > And the new rule would be that wherever there are documents that share
> the
> > same document id, you get the same effects as the Row/DocumentCollection.
> >  This would remove the need for multiple ids (rowId and recordId), and it
> > would be logically the same as normal Lucene.  The difference that Blur
> > would add is the ability to join on documentId by default.  We could also
> > configure the table to allow for duplicate document ids or not, that way
> > users can choose whether or not they need the document id join
> capability.
> >
> > What do you all think?
>
> The idea of getting rid of the "container" as a first class construct
> is compelling.  I don't find grouping by docid intuitive.  Maybe leave
> docid as a user field - typically distinct - and use a docGroupId to
> bind them?
>

Can't really do that, because the docGroupId (in your suggestion) has to be
used to distribute on so all the documents are co-located in the same
shard.  And I feel that having 2 different ids is a big part of the
confusion, what they are and when to use what.

If added an attribute in the table to allow for duplicate docIds or not
that would at least let end user to decide whether it's a grouping table or
not.

I don't know where to go with this, I'm trying to make this as intuitive as
possible for the typical case, which is just plain old Lucene documents and
still support the current features.

Aaron


>
> --tim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message