lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: revisit naming for grouping/join?
Date Tue, 05 Jul 2011 10:46:12 GMT
On Mon, Jul 4, 2011 at 3:38 PM, Chris Hostetter
<hossman_lucene@fucit.org> wrote:
>
> : Maybe modules/nesteddocuments (I think that's more descriptive than
> : subdocuments)?
>
> either way ... subdocuments has the advantage of being a shorter directory
> name.

Yeah both are rather long...

Maybe modules/nested? modules/nesteddocs?

> i kinda wonder about first impressions and the entomology of "nested" ...
> it makes me think of bird nests and russion dolls, neither of which
> really convey the point: nesting in birds is about protecting/incubating
> and is only a single layer; while russian nesting dolls are singular
> wrappers arround wrappers arround wrappers.
>
> subdocuments seems like it might better because it conveys more of a
> hierarchical nature (to me anyway).

Hmm... sub feels like it undersells, ie emphasizes "under" or
"inferior to" and de-emphasizes the strong cooperation w/ the parent.

Also, the nesting is not just one level -- it can support an arbitrary
star join.  So you can join from main to table1 and then from table1
to table2 (parent, child, grandchild).  You can also join to multiple
child tables from the main table.

I think nested/nesting has strong enough meaning among programmers
that most will understand what it means in this context.

> : How about NestedDocumentQuery?  And NestedDocumentCollector?
> :
> : See, you can use NestedDocumentQuery but collect it with any ordinary
> : collector if you don't care about the "nesting" (ie, you are only
> : interested in matches in the parent document space).  The
> : NestedDocumentCollector also collects all the nested docs matching
> : each parent hit.
>
> Hmmm...
>
> My suggestion of ParentDocumentQuery was based on the understanding that
> the simplest usecase was...
>
>  Query inner = getSomethingThatMatchesSomeChildDocs();
>  Filter parents = someFilterThatMatcheAllKnownParentDocs()
>  Query outer = new ParentDocumentQuery(inner, parents)
>  TopDocs results = searcher.search(outer)
>
> ...and in this case "results" will contain the parents of the child
> documents that match inner.  is that correct?

Correct.

> if so, then indepenent of the Collector, "ParentDocumentQuery" (or
> ParentDocumentQueryWrapper) still seems like it makes the most sense.

Hmm, but that doesn't convey that it handles this nesting, ie, that
it's joining child docs with parent docs.

Also, these queries can be nested (from 2nd join in the star join),
and so it could be ChildAndGrandChildrenQuery.

I guess Wrapper would make sense since it wraps a query matching the
nested docs.  I think Document is redundant/implied?

Maybe NestedQueryWrapper?

> For the Collector, i realize now that i totally missunderstood it's api --
> for some reason i thought it would wrap another Collector and proxy to the
> inner collector only the parents, independently collecting/recording the
> groups of parent->children info which could be asked for later.
>
> "ChildDocumentsCollector" definitely doesn't make ense -- it's not
> just collecting children, it's collecting Groups made up of parents
> and children ... GroupCollector is obviously too general though ... i
> would toss out "ParentChildrenTopGroupCollector" to make it clear that:
>  a) what you can get out of it are instances of TopGroups
>  b) the Groups consists of Parents and Children
>
> ...but that may be trying to convey too much in a classname.

I agree we want Top in the name, since it's collecting Top hits
according to provided Sort... I don't think we should put Groups in
the name just because this class (TopGroups) is used to represent the
returned hits.  Really in this context they aren't "groups" in the
grouping module sense; they are the nested docs (parent + children),
just using TopGroups to represent that for now.

In fact, once we generalize TopDocs so that the type of each hit can
be parameterized then this collector would return TopDocs<NestedDoc>
and each NestedDoc would have parent docID, maybe sort field values,
and then the TopDocs<ScoreDoc> holding the child hits.  (But I'm
scared of the generics required here!).

So I guess I would keep Top but drop Groups, and replace
ParentChildren with NestedDocs and move the Top in front:
TopNestedDocsCollector.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message