incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reworking the data model
Date Tue, 15 Oct 2013 01:28:50 GMT
After working through some of the code I have found myself in different
situations where the API gets ugly when trying get the rid of the
Row/DocumentCollection object.  So maybe we should take a different
approach for clarity, perhaps we should just rename Row to RecordCollection
or something like that.

It seems that the biggest point of confusion in the Blur data model is Row
and Records, if we just rename Row to something better (or Record for that
matter), maybe most of the confusion will go away.

What do you guys think?

Aaron


On Mon, Oct 14, 2013 at 9:33 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> I'm going to put some code together today so that we can take a look at
> different issues and see what they look like.
>
> Aaron
>
>
> On Mon, Oct 14, 2013 at 2:51 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> I missed emails/issues where this functionality is described, so I'm
>> commenting only on naming, trying to point out possible confusion with
>> other search projects.  "Collection" in Solr has a specific meaning -
>> it's a logical index in a Solr(Cloud) cluster.  So maybe that term
>> could be avoided here, too.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Sat, Oct 12, 2013 at 2:45 PM, Aaron McCurry <amccurry@gmail.com>
>> wrote:
>> > Perhaps, but the interesting thing is that I think that grouping
>> > functionality is actually very similar.  It's just a static structure
>> > instead of being dynamic.  At least if I understand the solr feature
>> > correctly.
>> >
>> > Maybe we should call it a DocumentCollection.  Since it's a collection
>> of
>> > documents.
>> >
>> > Aaron
>> >
>> >
>> > On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
>> > otis.gospodnetic@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> Note that Solr and Lucene both have grouping functionality, which some
>> >> people may confuse with DocGroups you are talking about here.
>> >>
>> >> Otis
>> >> --
>> >> Solr & ElasticSearch Support -- http://sematext.com/
>> >> Performance Monitoring -- http://sematext.com/spm
>> >>
>> >>
>> >>
>> >> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com>
>> wrote:
>> >> > While I don't really like the idea of changing all the code to
>> rename Row
>> >> > and Record, I think it is necessary to help people who are new to
>> Blur
>> >> > transition from Lucene (or any other document store for that matter).
>> >> >
>> >> > I think that having Doc and DocGroup both be first class objects is
>> also
>> >> > critical.  I think that for most implementations DocGroup is over
>> kill
>> >> and
>> >> > Document is the only thing needed.  I have some ideas on how to make
>> this
>> >> > possible in the API.
>> >> >
>> >> > Here's and example of what we could do, this is raw thrift which can
>> be
>> >> > ugly but with some helper/utility classes it can be made better:
>> >> >
>> >> > Doc doc = new Doc();
>> >> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
>> >> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
>> >> > 1234)));
>> >> > doc.addToFields(new Field("string_fieldname", new
>> >> Value(_Fields.STRING_VAL,
>> >> > "value1")));
>> >> > doc.addToFields(new Field("text_fieldname", new
>> Value(_Fields.TEXT_VAL,
>> >> > "this is full text indexed.")));
>> >> >
>> >> >
>> >> > DocGroup docGroup = new DocGroup();
>> >> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL,
>> "groupid12345"));
>> >> > docGroup.addToDocs(doc);
>> >> >
>> >> > At this point I think I would like to keep the docId and docGroupId.
>>  I
>> >> > know that Lucene itself doesn't require it but if we don't have them
>> >> > deletes/updates become a lot more expensive.  They would have to
>> >> broadcast
>> >> > the delete to all the shards of a table which would kill NRT updates.
>> >> >
>> >> > Thoughts?
>> >> >
>> >> > Aaron
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
>> >> > <garrett.barton@gmail.com>wrote:
>> >> >
>> >> >> +1 here.
>> >> >>
>> >> >> I also agree with Colton about making docgroup/row optional. I
know
>> in
>> >> the
>> >> >> current design its not easy but I remember Aaron saying in the
>> branch it
>> >> >> might be possible to specify any column as the I'd making me think
>> it
>> >> might
>> >> >> be possible to not have one at all.
>> >> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
>> >> wrote:
>> >> >>
>> >> >> > I personally think that the Row/Record/Column model makes
sense.
>> If
>> >> you
>> >> >> > have some documentation on the site saying here are the Lucene
>> >> >> equivalents
>> >> >> > to Blur it would probably avoid having those types of questions
>> in the
>> >> >> > future. If you have an explanation of this, you could leave
the
>> model
>> >> the
>> >> >> > same to avoid having to make a bunch of changes and cause
chaos.
>> >> >> >
>> >> >> > Glad the Family attribute is being dropped, I kinda came in
at the
>> >> end of
>> >> >> > it's lifespan I guess, because it doesn't really make much
sense
>> to
>> >> me.
>> >> >> How
>> >> >> > long till it's actually dropped from the code though?
>> >> >> >
>> >> >> > One thing I would like to see is Row be an option. In my current
>> >> >> > implementation of Lucene code I don't use them at all, because
>> what I
>> >> am
>> >> >> > working with makes no sense to have rows really. I also don't
>> recall
>> >> >> > DocGroups being required in Lucene, and I never worked with
them,
>> so
>> >> that
>> >> >> > kinda threw me off when I ran into it.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Colton McInroy
>> >> >> >
>> >> >> >  * Director of Security Engineering
>> >> >> >
>> >> >> >
>> >> >> > Phone
>> >> >> > (Toll Free)
>> >> >> > _US_    (888)-818-1344 Press 2
>> >> >> > _UK_    0-800-635-0551 Press 2
>> >> >> >
>> >> >> > My Extension    101
>> >> >> > 24/7 Support    support@dosarrest.com <mailto:
>> support@dosarrest.com>
>> >> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> >> >> > Website         http://www.dosarrest.com
>> >> >> >
>> >> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
>> >> >> >
>> >> >> >> Hi Devs,
>> >> >> >> I'm wondering if we should go ahead and endure the [painful]
>> move to
>> >> a
>> >> >> >> more intuitive data model in Blur?  Here are some observations:
>> >> >> >>
>> >> >> >> 1) New folks coming to Blur have a background in Lucene
- not
>> >> >> >> necessarily a NoSQL data store - and want to know where
their
>> >> >> >> "Documents" are.
>> >> >> >>
>> >> >> >> 2) For folks aware of NoSQL stores, the Row/Record model
can be
>> >> >> >> misleading in terms of design tradeoffs.
>> >> >> >>
>> >> >> >> 3) The Row/Record model seems to bring a significant explanation
>> >> burden.
>> >> >> >>
>> >> >> >> In the past we've talked about a model that's more aligned
with
>> >> >> >> Lucene's Document's.  Aaron did some api work on a branch
a while
>> >> back
>> >> >> >> and it's come up in an issue again recently.
>> >> >> >>
>> >> >> >> So, I'm wondering if now is the time to just endure some
shortish
>> >> >> >> period of pain changing everything over now?  The idea
being
>> >> something
>> >> >> >> like:
>> >> >> >>
>> >> >> >> Row -> DocGroup
>> >> >> >> Record -> Document
>> >> >> >> Column -> Field
>> >> >> >> Family -> (dropped)
>> >> >> >>
>> >> >> >> I think this will alleviate some confusion and provide
a solid
>> >> >> >> foundation for the long term; enabling a shorter learning
curve
>> and
>> >> >> >> less confusion.
>> >> >> >>
>> >> >> >> Such a big change would be good to get done while we're
still a
>> >> >> >> small-ish community but I think it's important that everyone
is
>> on
>> >> >> >> board - as it will no doubt create lots of short term
chaos and
>> >> >> >> confusion...
>> >> >> >>
>> >> >> >> Thoughts?
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> --tim
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message