incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagan Juneja <gagandeepjun...@gmail.com>
Subject Re: Reworking the data model
Date Sun, 13 Oct 2013 03:07:05 GMT
+1. Its a much cleaner approach. I struggled a lot to understand what
is the use of family and how it should be translated to lucene
documents while writing data to lucene (Aaron knows!).

I think document collection is required, But in some use case where
user just want to use blur as scale able lucene and just want to store
only documents for him document grouping is overhead. So as suggested
it should be optional.

Regards,
Gagan

On Sun, Oct 13, 2013 at 12:15 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> Perhaps, but the interesting thing is that I think that grouping
> functionality is actually very similar.  It's just a static structure
> instead of being dynamic.  At least if I understand the solr feature
> correctly.
>
> Maybe we should call it a DocumentCollection.  Since it's a collection of
> documents.
>
> Aaron
>
>
> On Tue, Oct 1, 2013 at 10:04 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> Note that Solr and Lucene both have grouping functionality, which some
>> people may confuse with DocGroups you are talking about here.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Mon, Sep 30, 2013 at 1:09 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>> > While I don't really like the idea of changing all the code to rename Row
>> > and Record, I think it is necessary to help people who are new to Blur
>> > transition from Lucene (or any other document store for that matter).
>> >
>> > I think that having Doc and DocGroup both be first class objects is also
>> > critical.  I think that for most implementations DocGroup is over kill
>> and
>> > Document is the only thing needed.  I have some ideas on how to make this
>> > possible in the API.
>> >
>> > Here's and example of what we could do, this is raw thrift which can be
>> > ugly but with some helper/utility classes it can be made better:
>> >
>> > Doc doc = new Doc();
>> > doc.setDocId(new Value(_Fields.LONG_VAL, 1234L));
>> > doc.addToFields(new Field("int_fieldname", new Value(_Fields.INT_VAL,
>> > 1234)));
>> > doc.addToFields(new Field("string_fieldname", new
>> Value(_Fields.STRING_VAL,
>> > "value1")));
>> > doc.addToFields(new Field("text_fieldname", new Value(_Fields.TEXT_VAL,
>> > "this is full text indexed.")));
>> >
>> >
>> > DocGroup docGroup = new DocGroup();
>> > docGroup.setDocGroupId(new Value(_Fields.STRING_VAL, "groupid12345"));
>> > docGroup.addToDocs(doc);
>> >
>> > At this point I think I would like to keep the docId and docGroupId.  I
>> > know that Lucene itself doesn't require it but if we don't have them
>> > deletes/updates become a lot more expensive.  They would have to
>> broadcast
>> > the delete to all the shards of a table which would kill NRT updates.
>> >
>> > Thoughts?
>> >
>> > Aaron
>> >
>> >
>> >
>> > On Mon, Sep 30, 2013 at 12:49 PM, Garrett Barton
>> > <garrett.barton@gmail.com>wrote:
>> >
>> >> +1 here.
>> >>
>> >> I also agree with Colton about making docgroup/row optional. I know in
>> the
>> >> current design its not easy but I remember Aaron saying in the branch it
>> >> might be possible to specify any column as the I'd making me think it
>> might
>> >> be possible to not have one at all.
>> >> On Sep 30, 2013 10:41 AM, "Colton McInroy" <colton@dosarrest.com>
>> wrote:
>> >>
>> >> > I personally think that the Row/Record/Column model makes sense. If
>> you
>> >> > have some documentation on the site saying here are the Lucene
>> >> equivalents
>> >> > to Blur it would probably avoid having those types of questions in
the
>> >> > future. If you have an explanation of this, you could leave the model
>> the
>> >> > same to avoid having to make a bunch of changes and cause chaos.
>> >> >
>> >> > Glad the Family attribute is being dropped, I kinda came in at the
>> end of
>> >> > it's lifespan I guess, because it doesn't really make much sense to
>> me.
>> >> How
>> >> > long till it's actually dropped from the code though?
>> >> >
>> >> > One thing I would like to see is Row be an option. In my current
>> >> > implementation of Lucene code I don't use them at all, because what
I
>> am
>> >> > working with makes no sense to have rows really. I also don't recall
>> >> > DocGroups being required in Lucene, and I never worked with them, so
>> that
>> >> > kinda threw me off when I ran into it.
>> >> >
>> >> > Thanks,
>> >> > Colton McInroy
>> >> >
>> >> >  * Director of Security Engineering
>> >> >
>> >> >
>> >> > Phone
>> >> > (Toll Free)
>> >> > _US_    (888)-818-1344 Press 2
>> >> > _UK_    0-800-635-0551 Press 2
>> >> >
>> >> > My Extension    101
>> >> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> >> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> >> > Website         http://www.dosarrest.com
>> >> >
>> >> > On 9/30/2013 6:45 AM, Tim Williams wrote:
>> >> >
>> >> >> Hi Devs,
>> >> >> I'm wondering if we should go ahead and endure the [painful] move
to
>> a
>> >> >> more intuitive data model in Blur?  Here are some observations:
>> >> >>
>> >> >> 1) New folks coming to Blur have a background in Lucene - not
>> >> >> necessarily a NoSQL data store - and want to know where their
>> >> >> "Documents" are.
>> >> >>
>> >> >> 2) For folks aware of NoSQL stores, the Row/Record model can be
>> >> >> misleading in terms of design tradeoffs.
>> >> >>
>> >> >> 3) The Row/Record model seems to bring a significant explanation
>> burden.
>> >> >>
>> >> >> In the past we've talked about a model that's more aligned with
>> >> >> Lucene's Document's.  Aaron did some api work on a branch a while
>> back
>> >> >> and it's come up in an issue again recently.
>> >> >>
>> >> >> So, I'm wondering if now is the time to just endure some shortish
>> >> >> period of pain changing everything over now?  The idea being
>> something
>> >> >> like:
>> >> >>
>> >> >> Row -> DocGroup
>> >> >> Record -> Document
>> >> >> Column -> Field
>> >> >> Family -> (dropped)
>> >> >>
>> >> >> I think this will alleviate some confusion and provide a solid
>> >> >> foundation for the long term; enabling a shorter learning curve
and
>> >> >> less confusion.
>> >> >>
>> >> >> Such a big change would be good to get done while we're still a
>> >> >> small-ish community but I think it's important that everyone is
on
>> >> >> board - as it will no doubt create lots of short term chaos and
>> >> >> confusion...
>> >> >>
>> >> >> Thoughts?
>> >> >>
>> >> >> Thanks,
>> >> >> --tim
>> >> >>
>> >> >
>> >> >
>> >>
>>

Mime
View raw message