lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: revisit naming for grouping/join?
Date Sun, 03 Jul 2011 15:25:44 GMT
On Fri, Jul 1, 2011 at 9:28 AM, mark harwood <markharw00d@yahoo.co.uk> wrote:
>>> I think what would be best is a smallish but feature complete demo,
>
> For the nested stuff I had a reasonable demo on LUCENE-2454 that was based
> around resumes - that use case has the one-to-many characteristics that lends
> itself to nested e.g. a person has many different qualifications and records of
> employment.
> This scenario was illustrated
> here: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene
>
> I also had the "book search" type scenario where a book has many sections and
> for the purposes of efficient highlighting/summarisation  these sections were
> treated as child docs which could be read quickly (rather than highlighting a
> whole book)

I think both resumes and book search, and also others like the
variants of a product SKU, would all make good examples for the nested
docs use case.

> I'm not sure what the "parent" was in your doctor and cities example, Mike. If a
> doctor is in only one city then there is no point making city a child doc as the
> one city info can happily be combined with the doctor info into a single
> document with no conflict (doctors have different properties to cities).
> If the city is the parent with many child doctor docs that makes more sense but
> feels like a less likely use case e.g. "find me a city with doctor x and a
> different doctor y"
> Searching for a person with excellent java and prefrerably good lucene skills
> feels like a more real-world example.

In my example the city was parent -- I raised this example to explain
that index-time joining is more general than just nested docs (ie, I
think we should keep the name "join" for this module... also because
we should factor out more general search-time-only join capabilities
into it).

> It feels like documenting some of the trade-offs behind index design choices is
> useful too e.g. nesting is not too great for very volatile content with
> constantly changing children while search-time join is more costly in RAM and
> 2-pass processing

+1, especially once we've factored out generic joins.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message