incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Reworking the data model
Date Wed, 02 Oct 2013 02:04:09 GMT
Hi,

Note that Solr and Lucene both have grouping functionality, which some
people may confuse with DocGroups you are talking about here.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> While I don't really like the idea of changing all the code to rename Row
> and Record, I think it is necessary to help people who are new to Blur
> transition from Lucene (or any other document store for that matter).
>
> I think that having Doc and DocGroup both be first class objects is also
> critical.  I think that for most implementations DocGroup is over kill and
> Document is the only thing needed.  I have some ideas on how to make this
> possible in the API.
>
> Here's and example of what we could do, this is raw thrift which can be
> ugly but with some helper/utility classes it can be made better:
>
> Doc doc = new Doc();
> doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
> doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
> 1234)));
> doc.addToFields(new Field("string_fieldname", new Value(_Fields.STRING_VAL,
> "value1")));
> doc.addToFields(new Field("text_fieldname", new Value(_Fields.TEXT_VAL,
> "this is full text indexed.")));
>
>
> DocGroup docGroup = new DocGroup();
> docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
> docGroup.addToDocs(doc);
>
> At this point I think I would like to keep the docId and docGroupId.  I
> know that Lucene itself doesn't require it but if we don't have them
> deletes/updates become a lot more expensive.  They would have to broadcast
> the delete to all the shards of a table which would kill NRT updates.
>
> Thoughts?
>
> Aaron
>
>
>
> On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
> <garrett.barton@gmail.com>wrote:
>
>> +1 here.
>>
>> I also agree with Colton about making docgroup/row optional. I know in the
>> current design its not easy but I remember Aaron saying in the branch it
>> might be possible to specify any column as the I'd making me think it might
>> be possible to not have one at all.
>> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com> wrote:
>>
>> > I personally think that the Row/Record/Column model makes sense. If you
>> > have some documentation on the site saying here are the Lucene
>> equivalents
>> > to Blur it would probably avoid having those types of questions in the
>> > future. If you have an explanation of this, you could leave the model the
>> > same to avoid having to make a bunch of changes and cause chaos.
>> >
>> > Glad the Family attribute is being dropped, I kinda came in at the end of
>> > it's lifespan I guess, because it doesn't really make much sense to me.
>> How
>> > long till it's actually dropped from the code though?
>> >
>> > One thing I would like to see is Row be an option. In my current
>> > implementation of Lucene code I don't use them at all, because what I am
>> > working with makes no sense to have rows really. I also don't recall
>> > DocGroups being required in Lucene, and I never worked with them, so that
>> > kinda threw me off when I ran into it.
>> >
>> > Thanks,
>> > Colton McInroy
>> >
>> >  * Director of Security Engineering
>> >
>> >
>> > Phone
>> > (Toll Free)
>> > _US_    (888)-818-1344 Press 2
>> > _UK_    0-800-635-0551 Press 2
>> >
>> > My Extension    101
>> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> > Website         http://www.dosarrest.com
>> >
>> > On 9/30/2013 6:45 AM, Tim Williams wrote:
>> >
>> >> Hi Devs,
>> >> I'm wondering if we should go ahead and endure the [painful] move to a
>> >> more intuitive data model in Blur?  Here are some observations:
>> >>
>> >> 1) New folks coming to Blur have a background in Lucene - not
>> >> necessarily a NoSQL data store - and want to know where their
>> >> "Documents" are.
>> >>
>> >> 2) For folks aware of NoSQL stores, the Row/Record model can be
>> >> misleading in terms of design tradeoffs.
>> >>
>> >> 3) The Row/Record model seems to bring a significant explanation burden.
>> >>
>> >> In the past we've talked about a model that's more aligned with
>> >> Lucene's Document's.  Aaron did some api work on a branch a while back
>> >> and it's come up in an issue again recently.
>> >>
>> >> So, I'm wondering if now is the time to just endure some shortish
>> >> period of pain changing everything over now?  The idea being something
>> >> like:
>> >>
>> >> Row -> DocGroup
>> >> Record -> Document
>> >> Column -> Field
>> >> Family -> (dropped)
>> >>
>> >> I think this will alleviate some confusion and provide a solid
>> >> foundation for the long term; enabling a shorter learning curve and
>> >> less confusion.
>> >>
>> >> Such a big change would be good to get done while we're still a
>> >> small-ish community but I think it's important that everyone is on
>> >> board - as it will no doubt create lots of short term chaos and
>> >> confusion...
>> >>
>> >> Thoughts?
>> >>
>> >> Thanks,
>> >> --tim
>> >>
>> >
>> >
>>

Mime
View raw message