Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FD77620C for ; Mon, 4 Jul 2011 19:09:48 +0000 (UTC) Received: (qmail 56874 invoked by uid 500); 4 Jul 2011 19:09:47 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 56791 invoked by uid 500); 4 Jul 2011 19:09:46 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 56784 invoked by uid 99); 4 Jul 2011 19:09:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 19:09:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 19:09:41 +0000 Received: by vws7 with SMTP id 7so6395418vws.35 for ; Mon, 04 Jul 2011 12:09:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.76.193 with SMTP id m1mr5394613vdw.204.1309806560241; Mon, 04 Jul 2011 12:09:20 -0700 (PDT) Received: by 10.52.157.226 with HTTP; Mon, 4 Jul 2011 12:09:20 -0700 (PDT) In-Reply-To: References: <1309526890.54413.YahooMailRC@web29020.mail.ird.yahoo.com> Date: Mon, 4 Jul 2011 15:09:20 -0400 Message-ID: Subject: Re: revisit naming for grouping/join? From: Michael McCandless To: dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable OK I'm sold! I agree: let's rename this new module according to the most likely use case, not according to its "logical function", and I agree nested documents is the compelling use case here. Then fully generic joins can go to a new module/join. Maybe modules/nesteddocuments (I think that's more descriptive than subdocuments)? How about NestedDocumentQuery? And NestedDocumentCollector? See, you can use NestedDocumentQuery but collect it with any ordinary collector if you don't care about the "nesting" (ie, you are only interested in matches in the parent document space). The NestedDocumentCollector also collects all the nested docs matching each parent hit. You can of course still use this Query/Collector for any kind of join, as long as your app is able to do this join at indexing time and index all joined docs to a single row of the primary table as a doc block. But this will presumably be a less common use case so I agree we should just name this feature according to its common use case. Mike McCandless http://blog.mikemccandless.com On Mon, Jul 4, 2011 at 1:34 PM, Chris Hostetter wrote: > > : In my example the city was parent -- I raised this example to explain > : that index-time joining is more general than just nested docs (ie, I > : think we should keep the name "join" for this module... also because > : we should factor out more general search-time-only join capabilities > : into it). > > i think that may be the wrong approach to take when discussing "examples"= , > while it's great to say there are dozens of usecases that these features > can all support in dozens of diff ways" we should relaly focus on > naming/deming these use cases in the ways where they really make the most > sense. > > In otherwords, i don't think we should say "All of these types of problem= s > are different types of nails, and all of these modules are specialty > hammers that are slightly distinct from eachother in how they work, but > you can use any of these hammers on any of these nails" =A0instead we sho= uld > say "here are some specialty hammers, you can use them for lots of > types of nails, ut for each hammer here is the type of nail where it > really shines" > > > "block-index-join" as i understand it requires all the docs you want to > join up to be in one contigious range of docids in the index, so if you w= ant to > re-index one doc in a block you have to re-index the entire block -- so > the city/doctor example doesn't sound like a good generic example of > when/why to use this (because a doctor might change his office > hours, or address -- maybe even leavong the city completely, while a > city might change it's population w/o the doctor being affected at all. > > The "book and pages" example seems much more appropriate, since in the > real world these things change in lock step -- pages aren't added/removed= to > a book; pages don't change w/o the book itself being fundementally > changed. =A0the fields of a page document are the text of that page, and > that is inheriently data about the book -- the fields of a doctor > document are metadata about the doctor, and that is not inheriently data > about the city the doctor lives in. > > as for the name ... i understand why it's called "module/join" and i > understand why the classes are called "BlockJoinQuery" and > "BlockJoinCollector" but i don't think those names really stand out and > convey to end users what they do and how/why they are useful. > > Personally i think better names would be "modules/subdocuments", > "ParentDocumentQuery" and "ChildDocumentsCollector" > > I know mcccandless isn't a fan of the name "Nested Documents" because thi= s > functionality *can* be used for use cases where the data being modeled is > not strictly organized in a nested relationship, but that doesn't mean > it's *optimal* or easy for a user to apply to other usecases, because the= y > have to design their model (and their indexing strategy) in such a way > that they think them as nested or hierarchical documents. > > Naming it "module/subdocuments" would not only emphasis the usecase where > it really shines, it would more importantly draw attention to how users > have to model their data in order to take advantage of it -- and using > "ParentDocument" and "ChildDocuments" in the names of the Query/Collector > would make it clear what they "match" on relative the underlying query > that they wrap/collect > > it would also help distibguish from more general joins like what solr > does today -- it seems like that should eventually take the name > "module/join" > > At a minum we should rename what we have now "modules/block-join" or > "modules/index-join" (but the later is confusing) and eventually add > "modules/query-join" =A0(yes, yes, block joins provide a query, btu the > differnce is when you you have to make a decision about how you want to > join your model, at index time or at query time. > > > -Hoss > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org