lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: revisit naming for grouping/join?
Date Wed, 06 Jul 2011 12:40:24 GMT
Right -- searching XML docs is a great fit for
BlockJoinQuery/Collector (and XML doc is really just a single-doc star
join).

But I agree picking XML search as the "dominant use case we use to
name this module and its classes" is dangerous because we don't (yet?)
do all the other things (eg XQuery) that one would expect with XML
search.

I think this is also another good reason to fall back to the
functional names (because this Query/Collector is just "finishing" a
join, where the join was done during indexing) instead of the use-case
names, ie, "XML search" is also a dominant use case here, just liked
"nested docs".

SKU search, obviously a common use case in e-commerce, is another example.

So maybe we should not be trying to name these classes on any single
use case... maybe instead name them by their function ("join"), and
then make good examples that show how each of these use cases can be
handled as index-time joins.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 6, 2011 at 5:09 AM, mark harwood <markharw00d@yahoo.co.uk> wrote:
> Of course you could take this whole hierarchical thing down the XML route i.e.
> Lucene as an XML store.
> Everyone knows what XML is and it is open to representing lots of things and
> there is a lot of data already out there in this form.
> I believe this is MarkLogic's schtick.
>
> There are several disadvantages though -  XML may bring with it too much baggage
> in the form of existing standards e.g. people wanting XQuery support, includes,
> schemas etc.
> XML attributes vs nested elements and how they map to physical Lucene docs is
> also not necessarily a neat fit or one that can easily be automated.
> Existing XML query languages are probably too binary in their matching logic (it
> matches or it doesn't - no relevance ranking etc)  and also too restrictive in
> their query types so I wouldn't bother implementing the  XQuery syntax
>
>
>
>
> ----- Original Message ----
> From: Chris Hostetter <hossman_lucene@fucit.org>
> To: dev@lucene.apache.org
> Sent: Tue, 5 July, 2011 21:58:35
> Subject: Re: revisit naming for grouping/join?
>
>
> : Maybe modules/nested? modules/nesteddocs?
>
>    modules/subdocs
>    modules/nesteddocs
>    modules/nested
>
> None of them scream "this is the perfect name" to me, but none of them
> scream "dear lord this is a terrible idea" either.
>
> Instinct says "All other factors being equal, pick the shortest name"
>
> : Hmm... sub feels like it undersells, ie emphasizes "under" or
> : "inferior to" and de-emphasizes the strong cooperation w/ the parent.
>
> hmmm, maybe.  Coming from a background where the taxonomy was king i would
> never think that about "subdocuments", but i respect that my perception
> may be unique that way.
>
> : Also, the nesting is not just one level -- it can support an arbitrary
> : star join.  So you can join from main to table1 and then from table1
> : to table2 (parent, child, grandchild).  You can also join to multiple
> : child tables from the main table.
>
> Sure, and i agree my russian doll image fits with that part -- my concern
> is that "nesting" in that context suggests a singler child (which may have
> it's own singular child, etc...).
>
> 6 of 1, half dozen of the other.
>
> : > if so, then indepenent of the Collector, "ParentDocumentQuery" (or
> : > ParentDocumentQueryWrapper) still seems like it makes the most sense.
> :
> : Hmm, but that doesn't convey that it handles this nesting, ie, that
> : it's joining child docs with parent docs.
>
> you lost me there ... i feel like i'm not understanding your use of
> "joining"
>
> If...
>  * x is the parent of x1, x2, etc...
>  * y is the parent of y1, y2, etc...
>  * queryA matches w5, x1, x2, and z3
>  * queryB = BlockJoinQuery(queryA)
>
> ...then doesn't queryB only match w, x, and z?
>
> Isn't it just a query that wraps another query and returns the parents of
> the docs matched by the wrapped query?
>
> : Also, these queries can be nested (from 2nd join in the star join),
> : and so it could be ChildAndGrandChildrenQuery.
>
> a) wouldn't that require you to wrap multiple of them? (ie: new
> BlockJoinQuery(new BlockJoinQuery(childQ, ...), ...)
>
> b) the use of child suggests that if the base query matches parents, then
> the wrapper will match children ... i think you really want it to clarify
> that the relationship goes the other way -- it returns parents (and grand
> parents, etc..) of the wrapped query ?
>
> : I guess Wrapper would make sense since it wraps a query matching the
> : nested docs.  I think Document is redundant/implied?
> :
> : Maybe NestedQueryWrapper?
>
> ugh... that seems like it might be really confusing ... "nested what?" ...
> "nested query?" .. "of course it's a wrapper, it wraps a nested query"
>
> another reason why naming they query after the 'parents' might better
> explain what it does.
>
> if i'm missunderstanding the docs, and it can go all the way up the
> taxonomy w/o having to use nest instances inside other instances then
> maybe "AncestorDocumentQueryWrapper" or "OuterDocumentQueryWrapper" could
> make sense?
>
> Dare i suggest "WrappingDocumentQueryWrapper" ?
>
> (i hate naming shit ... good names are too fucking hard .. and i can't
> find any antonyms for "nested" in the context we mean)
>
> : In fact, once we generalize TopDocs so that the type of each hit can
> : be parameterized then this collector would return TopDocs<NestedDoc>
>    ...
> : So I guess I would keep Top but drop Groups, and replace
> : ParentChildren with NestedDocs and move the Top in front:
> : TopNestedDocsCollector.
>
> yep, yep ... sounds just ... just feel like we need
> something better then anything we've come up for so far for the query,
> something to adequately explain that it (essenially) does the inverse of
> the collector -- going up the taxonomy and matching the parents/wrappers
> of the nested documents matched by the base query.
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message