incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Reworking the data model
Date Mon, 14 Oct 2013 06:51:30 GMT
Hi,

I missed emails/issues where this functionality is described, so I'm
commenting only on naming, trying to point out possible confusion with
other search projects.  "Collection" in Solr has a specific meaning -
it's a logical index in a Solr(Cloud) cluster.  So maybe that term
could be avoided here, too.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Sat, Oct 12, 2013 at 2:45 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> Perhaps, but the interesting thing is that I think that grouping
> functionality is actually very similar.  It's just a static structure
> instead of being dynamic.  At least if I understand the solr feature
> correctly.
>
> Maybe we should call it a DocumentCollection.  Since it's a collection of
> documents.
>
> Aaron
>
>
> On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> Note that Solr and Lucene both have grouping functionality, which some
>> people may confuse with DocGroups you are talking about here.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>> > While I don't really like the idea of changing all the code to rename Row
>> > and Record, I think it is necessary to help people who are new to Blur
>> > transition from Lucene (or any other document store for that matter).
>> >
>> > I think that having Doc and DocGroup both be first class objects is also
>> > critical.  I think that for most implementations DocGroup is over kill
>> and
>> > Document is the only thing needed.  I have some ideas on how to make this
>> > possible in the API.
>> >
>> > Here's and example of what we could do, this is raw thrift which can be
>> > ugly but with some helper/utility classes it can be made better:
>> >
>> > Doc doc = new Doc();
>> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
>> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
>> > 1234)));
>> > doc.addToFields(new Field("string_fieldname", new
>> Value(_Fields.STRING_VAL,
>> > "value1")));
>> > doc.addToFields(new Field("text_fieldname", new Value(_Fields.TEXT_VAL,
>> > "this is full text indexed.")));
>> >
>> >
>> > DocGroup docGroup = new DocGroup();
>> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
>> > docGroup.addToDocs(doc);
>> >
>> > At this point I think I would like to keep the docId and docGroupId.  I
>> > know that Lucene itself doesn't require it but if we don't have them
>> > deletes/updates become a lot more expensive.  They would have to
>> broadcast
>> > the delete to all the shards of a table which would kill NRT updates.
>> >
>> > Thoughts?
>> >
>> > Aaron
>> >
>> >
>> >
>> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
>> > <garrett.barton@gmail.com>wrote:
>> >
>> >> +1 here.
>> >>
>> >> I also agree with Colton about making docgroup/row optional. I know in
>> the
>> >> current design its not easy but I remember Aaron saying in the branch it
>> >> might be possible to specify any column as the I'd making me think it
>> might
>> >> be possible to not have one at all.
>> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
>> wrote:
>> >>
>> >> > I personally think that the Row/Record/Column model makes sense. If
>> you
>> >> > have some documentation on the site saying here are the Lucene
>> >> equivalents
>> >> > to Blur it would probably avoid having those types of questions in
the
>> >> > future. If you have an explanation of this, you could leave the model
>> the
>> >> > same to avoid having to make a bunch of changes and cause chaos.
>> >> >
>> >> > Glad the Family attribute is being dropped, I kinda came in at the
>> end of
>> >> > it's lifespan I guess, because it doesn't really make much sense to
>> me.
>> >> How
>> >> > long till it's actually dropped from the code though?
>> >> >
>> >> > One thing I would like to see is Row be an option. In my current
>> >> > implementation of Lucene code I don't use them at all, because what
I
>> am
>> >> > working with makes no sense to have rows really. I also don't recall
>> >> > DocGroups being required in Lucene, and I never worked with them, so
>> that
>> >> > kinda threw me off when I ran into it.
>> >> >
>> >> > Thanks,
>> >> > Colton McInroy
>> >> >
>> >> >  * Director of Security Engineering
>> >> >
>> >> >
>> >> > Phone
>> >> > (Toll Free)
>> >> > _US_    (888)-818-1344 Press 2
>> >> > _UK_    0-800-635-0551 Press 2
>> >> >
>> >> > My Extension    101
>> >> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> >> > Website         http://www.dosarrest.com
>> >> >
>> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
>> >> >
>> >> >> Hi Devs,
>> >> >> I'm wondering if we should go ahead and endure the [painful] move
to
>> a
>> >> >> more intuitive data model in Blur?  Here are some observations:
>> >> >>
>> >> >> 1) New folks coming to Blur have a background in Lucene - not
>> >> >> necessarily a NoSQL data store - and want to know where their
>> >> >> "Documents" are.
>> >> >>
>> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can be
>> >> >> misleading in terms of design tradeoffs.
>> >> >>
>> >> >> 3) The Row/Record model seems to bring a significant explanation
>> burden.
>> >> >>
>> >> >> In the past we've talked about a model that's more aligned with
>> >> >> Lucene's Document's.  Aaron did some api work on a branch a while
>> back
>> >> >> and it's come up in an issue again recently.
>> >> >>
>> >> >> So, I'm wondering if now is the time to just endure some shortish
>> >> >> period of pain changing everything over now?  The idea being
>> something
>> >> >> like:
>> >> >>
>> >> >> Row -> DocGroup
>> >> >> Record -> Document
>> >> >> Column -> Field
>> >> >> Family -> (dropped)
>> >> >>
>> >> >> I think this will alleviate some confusion and provide a solid
>> >> >> foundation for the long term; enabling a shorter learning curve
and
>> >> >> less confusion.
>> >> >>
>> >> >> Such a big change would be good to get done while we're still a
>> >> >> small-ish community but I think it's important that everyone is
on
>> >> >> board - as it will no doubt create lots of short term chaos and
>> >> >> confusion...
>> >> >>
>> >> >> Thoughts?
>> >> >>
>> >> >> Thanks,
>> >> >> --tim
>> >> >>
>> >> >
>> >> >
>> >>
>>

Mime
View raw message