incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reworking the data model
Date Sun, 13 Oct 2013 12:19:01 GMT
I had another thought yesterday that might be even simpler while being able
to maintain all current features.

Instead of having:

Row with rowId (DocumentCollection with docCollectionId)
Record with recordId (Document with docId)
   - Dropping Family
Column with name and value (Field with name and value)

We drop Row/DocumentCollection altogether and we don't require docId to be
unique.

So it would be:

Document with docId
Field with name and value

And the new rule would be that wherever there are documents that share the
same document id, you get the same effects as the Row/DocumentCollection.
 This would remove the need for multiple ids (rowId and recordId), and it
would be logically the same as normal Lucene.  The difference that Blur
would add is the ability to join on documentId by default.  We could also
configure the table to allow for duplicate document ids or not, that way
users can choose whether or not they need the document id join capability.

What do you all think?

Aaron



On Sat, Oct 12, 2013 at 11:07 PM, Gagan Juneja <gagandeepjuneja@gmail.com>wrote:

> +1. Its a much cleaner approach. I struggled a lot to understand what
> is the use of family and how it should be translated to lucene
> documents while writing data to lucene (Aaron knows!).
>
> I think document collection is required, But in some use case where
> user just want to use blur as scale able lucene and just want to store
> only documents for him document grouping is overhead. So as suggested
> it should be optional.
>
> Regards,
> Gagan
>
> On Sun, Oct 13, 2013 at 12:15 AM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> > Perhaps, but the interesting thing is that I think that grouping
> > functionality is actually very similar.  It's just a static structure
> > instead of being dynamic.  At least if I understand the solr feature
> > correctly.
> >
> > Maybe we should call it a DocumentCollection.  Since it's a collection of
> > documents.
> >
> > Aaron
> >
> >
> > On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Note that Solr and Lucene both have grouping functionality, which some
> >> people may confuse with DocGroups you are talking about here.
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support -- http://sematext.com/
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >> > While I don't really like the idea of changing all the code to rename
> Row
> >> > and Record, I think it is necessary to help people who are new to Blur
> >> > transition from Lucene (or any other document store for that matter).
> >> >
> >> > I think that having Doc and DocGroup both be first class objects is
> also
> >> > critical.  I think that for most implementations DocGroup is over kill
> >> and
> >> > Document is the only thing needed.  I have some ideas on how to make
> this
> >> > possible in the API.
> >> >
> >> > Here's and example of what we could do, this is raw thrift which can
> be
> >> > ugly but with some helper/utility classes it can be made better:
> >> >
> >> > Doc doc = new Doc();
> >> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
> >> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
> >> > 1234)));
> >> > doc.addToFields(new Field("string_fieldname", new
> >> Value(_Fields.STRING_VAL,
> >> > "value1")));
> >> > doc.addToFields(new Field("text_fieldname", new
> Value(_Fields.TEXT_VAL,
> >> > "this is full text indexed.")));
> >> >
> >> >
> >> > DocGroup docGroup = new DocGroup();
> >> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
> >> > docGroup.addToDocs(doc);
> >> >
> >> > At this point I think I would like to keep the docId and docGroupId.
>  I
> >> > know that Lucene itself doesn't require it but if we don't have them
> >> > deletes/updates become a lot more expensive.  They would have to
> >> broadcast
> >> > the delete to all the shards of a table which would kill NRT updates.
> >> >
> >> > Thoughts?
> >> >
> >> > Aaron
> >> >
> >> >
> >> >
> >> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
> >> > <garrett.barton@gmail.com>wrote:
> >> >
> >> >> +1 here.
> >> >>
> >> >> I also agree with Colton about making docgroup/row optional. I know
> in
> >> the
> >> >> current design its not easy but I remember Aaron saying in the
> branch it
> >> >> might be possible to specify any column as the I'd making me think
it
> >> might
> >> >> be possible to not have one at all.
> >> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
> >> wrote:
> >> >>
> >> >> > I personally think that the Row/Record/Column model makes sense.
If
> >> you
> >> >> > have some documentation on the site saying here are the Lucene
> >> >> equivalents
> >> >> > to Blur it would probably avoid having those types of questions
in
> the
> >> >> > future. If you have an explanation of this, you could leave the
> model
> >> the
> >> >> > same to avoid having to make a bunch of changes and cause chaos.
> >> >> >
> >> >> > Glad the Family attribute is being dropped, I kinda came in at
the
> >> end of
> >> >> > it's lifespan I guess, because it doesn't really make much sense
to
> >> me.
> >> >> How
> >> >> > long till it's actually dropped from the code though?
> >> >> >
> >> >> > One thing I would like to see is Row be an option. In my current
> >> >> > implementation of Lucene code I don't use them at all, because
> what I
> >> am
> >> >> > working with makes no sense to have rows really. I also don't
> recall
> >> >> > DocGroups being required in Lucene, and I never worked with them,
> so
> >> that
> >> >> > kinda threw me off when I ran into it.
> >> >> >
> >> >> > Thanks,
> >> >> > Colton McInroy
> >> >> >
> >> >> >  * Director of Security Engineering
> >> >> >
> >> >> >
> >> >> > Phone
> >> >> > (Toll Free)
> >> >> > _US_    (888)-818-1344 Press 2
> >> >> > _UK_    0-800-635-0551 Press 2
> >> >> >
> >> >> > My Extension    101
> >> >> > 24/7 Support    support@dosarrest.com <mailto:
> support@dosarrest.com>
> >> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> >> >> > Website         http://www.dosarrest.com
> >> >> >
> >> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
> >> >> >
> >> >> >> Hi Devs,
> >> >> >> I'm wondering if we should go ahead and endure the [painful]
move
> to
> >> a
> >> >> >> more intuitive data model in Blur?  Here are some observations:
> >> >> >>
> >> >> >> 1) New folks coming to Blur have a background in Lucene -
not
> >> >> >> necessarily a NoSQL data store - and want to know where their
> >> >> >> "Documents" are.
> >> >> >>
> >> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can
be
> >> >> >> misleading in terms of design tradeoffs.
> >> >> >>
> >> >> >> 3) The Row/Record model seems to bring a significant explanation
> >> burden.
> >> >> >>
> >> >> >> In the past we've talked about a model that's more aligned
with
> >> >> >> Lucene's Document's.  Aaron did some api work on a branch
a while
> >> back
> >> >> >> and it's come up in an issue again recently.
> >> >> >>
> >> >> >> So, I'm wondering if now is the time to just endure some shortish
> >> >> >> period of pain changing everything over now?  The idea being
> >> something
> >> >> >> like:
> >> >> >>
> >> >> >> Row -> DocGroup
> >> >> >> Record -> Document
> >> >> >> Column -> Field
> >> >> >> Family -> (dropped)
> >> >> >>
> >> >> >> I think this will alleviate some confusion and provide a solid
> >> >> >> foundation for the long term; enabling a shorter learning
curve
> and
> >> >> >> less confusion.
> >> >> >>
> >> >> >> Such a big change would be good to get done while we're still
a
> >> >> >> small-ish community but I think it's important that everyone
is on
> >> >> >> board - as it will no doubt create lots of short term chaos
and
> >> >> >> confusion...
> >> >> >>
> >> >> >> Thoughts?
> >> >> >>
> >> >> >> Thanks,
> >> >> >> --tim
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message