lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: revisit naming for grouping/join?
Date Sun, 03 Jul 2011 15:25:44 GMT
On Fri, Jul 1, 2011 at 9:28 AM, mark harwood <> wrote:
>>> I think what would be best is a smallish but feature complete demo,
> For the nested stuff I had a reasonable demo on LUCENE-2454 that was based
> around resumes - that use case has the one-to-many characteristics that lends
> itself to nested e.g. a person has many different qualifications and records of
> employment.
> This scenario was illustrated
> here:
> I also had the "book search" type scenario where a book has many sections and
> for the purposes of efficient highlighting/summarisation  these sections were
> treated as child docs which could be read quickly (rather than highlighting a
> whole book)

I think both resumes and book search, and also others like the
variants of a product SKU, would all make good examples for the nested
docs use case.

> I'm not sure what the "parent" was in your doctor and cities example, Mike. If a
> doctor is in only one city then there is no point making city a child doc as the
> one city info can happily be combined with the doctor info into a single
> document with no conflict (doctors have different properties to cities).
> If the city is the parent with many child doctor docs that makes more sense but
> feels like a less likely use case e.g. "find me a city with doctor x and a
> different doctor y"
> Searching for a person with excellent java and prefrerably good lucene skills
> feels like a more real-world example.

In my example the city was parent -- I raised this example to explain
that index-time joining is more general than just nested docs (ie, I
think we should keep the name "join" for this module... also because
we should factor out more general search-time-only join capabilities
into it).

> It feels like documenting some of the trade-offs behind index design choices is
> useful too e.g. nesting is not too great for very volatile content with
> constantly changing children while search-time join is more costly in RAM and
> 2-pass processing

+1, especially once we've factored out generic joins.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message