incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reworking the data model
Date Sat, 12 Oct 2013 18:45:12 GMT
Perhaps, but the interesting thing is that I think that grouping
functionality is actually very similar.  It's just a static structure
instead of being dynamic.  At least if I understand the solr feature
correctly.

Maybe we should call it a DocumentCollection.  Since it's a collection of
documents.

Aaron


On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> Note that Solr and Lucene both have grouping functionality, which some
> people may confuse with DocGroups you are talking about here.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> > While I don't really like the idea of changing all the code to rename Row
> > and Record, I think it is necessary to help people who are new to Blur
> > transition from Lucene (or any other document store for that matter).
> >
> > I think that having Doc and DocGroup both be first class objects is also
> > critical.  I think that for most implementations DocGroup is over kill
> and
> > Document is the only thing needed.  I have some ideas on how to make this
> > possible in the API.
> >
> > Here's and example of what we could do, this is raw thrift which can be
> > ugly but with some helper/utility classes it can be made better:
> >
> > Doc doc = new Doc();
> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
> > 1234)));
> > doc.addToFields(new Field("string_fieldname", new
> Value(_Fields.STRING_VAL,
> > "value1")));
> > doc.addToFields(new Field("text_fieldname", new Value(_Fields.TEXT_VAL,
> > "this is full text indexed.")));
> >
> >
> > DocGroup docGroup = new DocGroup();
> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
> > docGroup.addToDocs(doc);
> >
> > At this point I think I would like to keep the docId and docGroupId.  I
> > know that Lucene itself doesn't require it but if we don't have them
> > deletes/updates become a lot more expensive.  They would have to
> broadcast
> > the delete to all the shards of a table which would kill NRT updates.
> >
> > Thoughts?
> >
> > Aaron
> >
> >
> >
> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
> > <garrett.barton@gmail.com>wrote:
> >
> >> +1 here.
> >>
> >> I also agree with Colton about making docgroup/row optional. I know in
> the
> >> current design its not easy but I remember Aaron saying in the branch it
> >> might be possible to specify any column as the I'd making me think it
> might
> >> be possible to not have one at all.
> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
> wrote:
> >>
> >> > I personally think that the Row/Record/Column model makes sense. If
> you
> >> > have some documentation on the site saying here are the Lucene
> >> equivalents
> >> > to Blur it would probably avoid having those types of questions in the
> >> > future. If you have an explanation of this, you could leave the model
> the
> >> > same to avoid having to make a bunch of changes and cause chaos.
> >> >
> >> > Glad the Family attribute is being dropped, I kinda came in at the
> end of
> >> > it's lifespan I guess, because it doesn't really make much sense to
> me.
> >> How
> >> > long till it's actually dropped from the code though?
> >> >
> >> > One thing I would like to see is Row be an option. In my current
> >> > implementation of Lucene code I don't use them at all, because what I
> am
> >> > working with makes no sense to have rows really. I also don't recall
> >> > DocGroups being required in Lucene, and I never worked with them, so
> that
> >> > kinda threw me off when I ran into it.
> >> >
> >> > Thanks,
> >> > Colton McInroy
> >> >
> >> >  * Director of Security Engineering
> >> >
> >> >
> >> > Phone
> >> > (Toll Free)
> >> > _US_    (888)-818-1344 Press 2
> >> > _UK_    0-800-635-0551 Press 2
> >> >
> >> > My Extension    101
> >> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> >> > Website         http://www.dosarrest.com
> >> >
> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
> >> >
> >> >> Hi Devs,
> >> >> I'm wondering if we should go ahead and endure the [painful] move to
> a
> >> >> more intuitive data model in Blur?  Here are some observations:
> >> >>
> >> >> 1) New folks coming to Blur have a background in Lucene - not
> >> >> necessarily a NoSQL data store - and want to know where their
> >> >> "Documents" are.
> >> >>
> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can be
> >> >> misleading in terms of design tradeoffs.
> >> >>
> >> >> 3) The Row/Record model seems to bring a significant explanation
> burden.
> >> >>
> >> >> In the past we've talked about a model that's more aligned with
> >> >> Lucene's Document's.  Aaron did some api work on a branch a while
> back
> >> >> and it's come up in an issue again recently.
> >> >>
> >> >> So, I'm wondering if now is the time to just endure some shortish
> >> >> period of pain changing everything over now?  The idea being
> something
> >> >> like:
> >> >>
> >> >> Row -> DocGroup
> >> >> Record -> Document
> >> >> Column -> Field
> >> >> Family -> (dropped)
> >> >>
> >> >> I think this will alleviate some confusion and provide a solid
> >> >> foundation for the long term; enabling a shorter learning curve and
> >> >> less confusion.
> >> >>
> >> >> Such a big change would be good to get done while we're still a
> >> >> small-ish community but I think it's important that everyone is on
> >> >> board - as it will no doubt create lots of short term chaos and
> >> >> confusion...
> >> >>
> >> >> Thoughts?
> >> >>
> >> >> Thanks,
> >> >> --tim
> >> >>
> >> >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message