incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: the feature formerly known as superquery
Date Sun, 27 Jan 2013 14:19:49 GMT
Ok, after giving this some thought.  I will offer up a solution that I have
discussed with others before.  It's sorta in-between Lucene documents and
what was the previous Blur API (Rows and Records).

If we simply added a list of subdocuments in the document object that
behaved exactly like regular documents.  So the struct would look like

Current:

struct Document {
 list<Field> fields
}

Preposed:

struct SubDocument {
 list<Field> fields
}

struct Document {
 list<Field> fields,
 list<SubDocument> subDocs
}

Obviously sub documents could be null and therefore a Document would behave
more or less like a Lucene document.  However if you added sub docs to a
document you would have a behavior closer to the Rows and Records of 0.1.x,
but you would also have your prime document idea.  The Document struct
could be used as a standalone document in the group to store different
information, thus giving you your prime document behavior.

What do you think?

The next big discussion is how to represent joins in the query.  :-)

Aaron


On Sat, Jan 26, 2013 at 6:54 PM, Tim Williams <williamstw@gmail.com> wrote:

> On Sat, Jan 26, 2013 at 1:35 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> > Well, this is a good topic. It will be possible, however there hasn't
> been
> > any formal implementation yet.
> >
> > Here are some thoughts from an API perspective.
> >
> > Currently a query is provided and results are returned as a list of
> > TopFieldDocs  Then within each TopFieldDoc there is a list of ScoreDocs
> > within each ScoreDoc there is a single document location represented by a
> > long.
> >
> > In the past when performing a SuperQuery or Join in Lucene terms, Blur
> > would actually respond with a single document location (docid) from the
> > group of documents.  It was always the first document in the grouping of
> > documents.
> >
> > Example:
> >
> > logical grouping | docid | hit | responding document id hit
> >
> > 0 | 0 | - | -
> > 0 | 1 | - | -
> > 1 | 2 | - | 2
> > 1 | 3 | x | -
> > 1 | 4 | - | -
> > 1 | 5 | x | -
> > 2 | 6 | - | -
> > 2 | 7 | - | -
> > 3 | 8 | - | -
> >
> > This is the "join" meaning the hit's within group 1 would respond with
> the
> > first document id in the group, which is docid 2 (but take note of how 3
> > and 5 were the documents that actually contained the hit.
> >
> > There have been many requests to change this behavior in 0.1 to something
> > like, respond with 3,5 as the docids in the first hit.
> >
> > So I suppose we change the ScoreDoc object to contain a list of longs for
> > the ScoreDoc to contain all of the document locations (docids) from the
> > group that were involved in creating the hit.
>
> I have too many docs per document group for that to be useful I think.
>  My scenario is.. "run 'docgroup' query(paged)"... displaying summary
> results... then, when a user asks for details, "snag the docs in a
> single 'docgroup' at at time".  I'd value a real primedoc - a place to
> store overview about the docgroup - more I think.  I wonder what the
> usage is for returning it all at once?
>
> --tim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message