incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reworking the data model
Date Mon, 14 Oct 2013 13:33:09 GMT
I'm going to put some code together today so that we can take a look at
different issues and see what they look like.

Aaron


On Mon, Oct 14, 2013 at 2:51 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> I missed emails/issues where this functionality is described, so I'm
> commenting only on naming, trying to point out possible confusion with
> other search projects.  "Collection" in Solr has a specific meaning -
> it's a logical index in a Solr(Cloud) cluster.  So maybe that term
> could be avoided here, too.
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Sat, Oct 12, 2013 at 2:45 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> > Perhaps, but the interesting thing is that I think that grouping
> > functionality is actually very similar.  It's just a static structure
> > instead of being dynamic.  At least if I understand the solr feature
> > correctly.
> >
> > Maybe we should call it a DocumentCollection.  Since it's a collection of
> > documents.
> >
> > Aaron
> >
> >
> > On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Note that Solr and Lucene both have grouping functionality, which some
> >> people may confuse with DocGroups you are talking about here.
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support -- http://sematext.com/
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >> > While I don't really like the idea of changing all the code to rename
> Row
> >> > and Record, I think it is necessary to help people who are new to Blur
> >> > transition from Lucene (or any other document store for that matter).
> >> >
> >> > I think that having Doc and DocGroup both be first class objects is
> also
> >> > critical.  I think that for most implementations DocGroup is over kill
> >> and
> >> > Document is the only thing needed.  I have some ideas on how to make
> this
> >> > possible in the API.
> >> >
> >> > Here's and example of what we could do, this is raw thrift which can
> be
> >> > ugly but with some helper/utility classes it can be made better:
> >> >
> >> > Doc doc = new Doc();
> >> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
> >> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
> >> > 1234)));
> >> > doc.addToFields(new Field("string_fieldname", new
> >> Value(_Fields.STRING_VAL,
> >> > "value1")));
> >> > doc.addToFields(new Field("text_fieldname", new
> Value(_Fields.TEXT_VAL,
> >> > "this is full text indexed.")));
> >> >
> >> >
> >> > DocGroup docGroup = new DocGroup();
> >> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
> >> > docGroup.addToDocs(doc);
> >> >
> >> > At this point I think I would like to keep the docId and docGroupId.
>  I
> >> > know that Lucene itself doesn't require it but if we don't have them
> >> > deletes/updates become a lot more expensive.  They would have to
> >> broadcast
> >> > the delete to all the shards of a table which would kill NRT updates.
> >> >
> >> > Thoughts?
> >> >
> >> > Aaron
> >> >
> >> >
> >> >
> >> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
> >> > <garrett.barton@gmail.com>wrote:
> >> >
> >> >> +1 here.
> >> >>
> >> >> I also agree with Colton about making docgroup/row optional. I know
> in
> >> the
> >> >> current design its not easy but I remember Aaron saying in the
> branch it
> >> >> might be possible to specify any column as the I'd making me think
it
> >> might
> >> >> be possible to not have one at all.
> >> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
> >> wrote:
> >> >>
> >> >> > I personally think that the Row/Record/Column model makes sense.
If
> >> you
> >> >> > have some documentation on the site saying here are the Lucene
> >> >> equivalents
> >> >> > to Blur it would probably avoid having those types of questions
in
> the
> >> >> > future. If you have an explanation of this, you could leave the
> model
> >> the
> >> >> > same to avoid having to make a bunch of changes and cause chaos.
> >> >> >
> >> >> > Glad the Family attribute is being dropped, I kinda came in at
the
> >> end of
> >> >> > it's lifespan I guess, because it doesn't really make much sense
to
> >> me.
> >> >> How
> >> >> > long till it's actually dropped from the code though?
> >> >> >
> >> >> > One thing I would like to see is Row be an option. In my current
> >> >> > implementation of Lucene code I don't use them at all, because
> what I
> >> am
> >> >> > working with makes no sense to have rows really. I also don't
> recall
> >> >> > DocGroups being required in Lucene, and I never worked with them,
> so
> >> that
> >> >> > kinda threw me off when I ran into it.
> >> >> >
> >> >> > Thanks,
> >> >> > Colton McInroy
> >> >> >
> >> >> >  * Director of Security Engineering
> >> >> >
> >> >> >
> >> >> > Phone
> >> >> > (Toll Free)
> >> >> > _US_    (888)-818-1344 Press 2
> >> >> > _UK_    0-800-635-0551 Press 2
> >> >> >
> >> >> > My Extension    101
> >> >> > 24/7 Support    support@dosarrest.com <mailto:
> support@dosarrest.com>
> >> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> >> >> > Website         http://www.dosarrest.com
> >> >> >
> >> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
> >> >> >
> >> >> >> Hi Devs,
> >> >> >> I'm wondering if we should go ahead and endure the [painful]
move
> to
> >> a
> >> >> >> more intuitive data model in Blur?  Here are some observations:
> >> >> >>
> >> >> >> 1) New folks coming to Blur have a background in Lucene -
not
> >> >> >> necessarily a NoSQL data store - and want to know where their
> >> >> >> "Documents" are.
> >> >> >>
> >> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can
be
> >> >> >> misleading in terms of design tradeoffs.
> >> >> >>
> >> >> >> 3) The Row/Record model seems to bring a significant explanation
> >> burden.
> >> >> >>
> >> >> >> In the past we've talked about a model that's more aligned
with
> >> >> >> Lucene's Document's.  Aaron did some api work on a branch
a while
> >> back
> >> >> >> and it's come up in an issue again recently.
> >> >> >>
> >> >> >> So, I'm wondering if now is the time to just endure some shortish
> >> >> >> period of pain changing everything over now?  The idea being
> >> something
> >> >> >> like:
> >> >> >>
> >> >> >> Row -> DocGroup
> >> >> >> Record -> Document
> >> >> >> Column -> Field
> >> >> >> Family -> (dropped)
> >> >> >>
> >> >> >> I think this will alleviate some confusion and provide a solid
> >> >> >> foundation for the long term; enabling a shorter learning
curve
> and
> >> >> >> less confusion.
> >> >> >>
> >> >> >> Such a big change would be good to get done while we're still
a
> >> >> >> small-ish community but I think it's important that everyone
is on
> >> >> >> board - as it will no doubt create lots of short term chaos
and
> >> >> >> confusion...
> >> >> >>
> >> >> >> Thoughts?
> >> >> >>
> >> >> >> Thanks,
> >> >> >> --tim
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message