lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: revisit naming for grouping/join?
Date Mon, 04 Jul 2011 19:09:20 GMT
OK I'm sold!

I agree: let's rename this new module according to the most likely use
case, not according to its "logical function", and I agree nested
documents is the compelling use case here.  Then fully generic joins
can go to a new module/join.

Maybe modules/nesteddocuments (I think that's more descriptive than
subdocuments)?

How about NestedDocumentQuery?  And NestedDocumentCollector?

See, you can use NestedDocumentQuery but collect it with any ordinary
collector if you don't care about the "nesting" (ie, you are only
interested in matches in the parent document space).  The
NestedDocumentCollector also collects all the nested docs matching
each parent hit.

You can of course still use this Query/Collector for any kind of
join, as long as your app is able to do this join at indexing time
and index all joined docs to a single row of the primary table as a
doc block.  But this will presumably be a less common use case so
I agree we should just name this feature according to its common use
case.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 4, 2011 at 1:34 PM, Chris Hostetter
<hossman_lucene@fucit.org> wrote:
>
> : In my example the city was parent -- I raised this example to explain
> : that index-time joining is more general than just nested docs (ie, I
> : think we should keep the name "join" for this module... also because
> : we should factor out more general search-time-only join capabilities
> : into it).
>
> i think that may be the wrong approach to take when discussing "examples",
> while it's great to say there are dozens of usecases that these features
> can all support in dozens of diff ways" we should relaly focus on
> naming/deming these use cases in the ways where they really make the most
> sense.
>
> In otherwords, i don't think we should say "All of these types of problems
> are different types of nails, and all of these modules are specialty
> hammers that are slightly distinct from eachother in how they work, but
> you can use any of these hammers on any of these nails"  instead we should
> say "here are some specialty hammers, you can use them for lots of
> types of nails, ut for each hammer here is the type of nail where it
> really shines"
>
>
> "block-index-join" as i understand it requires all the docs you want to
> join up to be in one contigious range of docids in the index, so if you want to
> re-index one doc in a block you have to re-index the entire block -- so
> the city/doctor example doesn't sound like a good generic example of
> when/why to use this (because a doctor might change his office
> hours, or address -- maybe even leavong the city completely, while a
> city might change it's population w/o the doctor being affected at all.
>
> The "book and pages" example seems much more appropriate, since in the
> real world these things change in lock step -- pages aren't added/removed to
> a book; pages don't change w/o the book itself being fundementally
> changed.  the fields of a page document are the text of that page, and
> that is inheriently data about the book -- the fields of a doctor
> document are metadata about the doctor, and that is not inheriently data
> about the city the doctor lives in.
>
> as for the name ... i understand why it's called "module/join" and i
> understand why the classes are called "BlockJoinQuery" and
> "BlockJoinCollector" but i don't think those names really stand out and
> convey to end users what they do and how/why they are useful.
>
> Personally i think better names would be "modules/subdocuments",
> "ParentDocumentQuery" and "ChildDocumentsCollector"
>
> I know mcccandless isn't a fan of the name "Nested Documents" because this
> functionality *can* be used for use cases where the data being modeled is
> not strictly organized in a nested relationship, but that doesn't mean
> it's *optimal* or easy for a user to apply to other usecases, because they
> have to design their model (and their indexing strategy) in such a way
> that they think them as nested or hierarchical documents.
>
> Naming it "module/subdocuments" would not only emphasis the usecase where
> it really shines, it would more importantly draw attention to how users
> have to model their data in order to take advantage of it -- and using
> "ParentDocument" and "ChildDocuments" in the names of the Query/Collector
> would make it clear what they "match" on relative the underlying query
> that they wrap/collect
>
> it would also help distibguish from more general joins like what solr
> does today -- it seems like that should eventually take the name
> "module/join"
>
> At a minum we should rename what we have now "modules/block-join" or
> "modules/index-join" (but the later is confusing) and eventually add
> "modules/query-join"  (yes, yes, block joins provide a query, btu the
> differnce is when you you have to make a decision about how you want to
> join your model, at index time or at query time.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message