incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: the feature formerly known as superquery
Date Sun, 27 Jan 2013 14:37:04 GMT
I was thinking that the search would return the document location of the
prime doc.  And a yet to be developed highlighting call would be
responsible for giving you the documents within group that that were hits.
 What do you think?

Aaron


On Sun, Jan 27, 2013 at 9:29 AM, Garrett Barton <garrett.barton@gmail.com>wrote:

> I like that idea better Aaron. I was going to say I wanted the ability to
> pic which Document was the prime so I could make that one a nice summary
> doc of the whole logical group.  Is there a way to be able to decide at
> query time to pull back the prime doc or the docs which were hit on within
> the logical group?  Or maybe both?
>
>
> On Sun, Jan 27, 2013 at 9:19 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > Ok, after giving this some thought.  I will offer up a solution that I
> have
> > discussed with others before.  It's sorta in-between Lucene documents and
> > what was the previous Blur API (Rows and Records).
> >
> > If we simply added a list of subdocuments in the document object that
> > behaved exactly like regular documents.  So the struct would look like
> >
> > Current:
> >
> > struct Document {
> >  list<Field> fields
> > }
> >
> > Preposed:
> >
> > struct SubDocument {
> >  list<Field> fields
> > }
> >
> > struct Document {
> >  list<Field> fields,
> >  list<SubDocument> subDocs
> > }
> >
> > Obviously sub documents could be null and therefore a Document would
> behave
> > more or less like a Lucene document.  However if you added sub docs to a
> > document you would have a behavior closer to the Rows and Records of
> 0.1.x,
> > but you would also have your prime document idea.  The Document struct
> > could be used as a standalone document in the group to store different
> > information, thus giving you your prime document behavior.
> >
> > What do you think?
> >
> > The next big discussion is how to represent joins in the query.  :-)
> >
> > Aaron
> >
> >
> > On Sat, Jan 26, 2013 at 6:54 PM, Tim Williams <williamstw@gmail.com>
> > wrote:
> >
> > > On Sat, Jan 26, 2013 at 1:35 PM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > > > Well, this is a good topic. It will be possible, however there hasn't
> > > been
> > > > any formal implementation yet.
> > > >
> > > > Here are some thoughts from an API perspective.
> > > >
> > > > Currently a query is provided and results are returned as a list of
> > > > TopFieldDocs  Then within each TopFieldDoc there is a list of
> ScoreDocs
> > > > within each ScoreDoc there is a single document location represented
> > by a
> > > > long.
> > > >
> > > > In the past when performing a SuperQuery or Join in Lucene terms,
> Blur
> > > > would actually respond with a single document location (docid) from
> the
> > > > group of documents.  It was always the first document in the grouping
> > of
> > > > documents.
> > > >
> > > > Example:
> > > >
> > > > logical grouping | docid | hit | responding document id hit
> > > >
> > > > 0 | 0 | - | -
> > > > 0 | 1 | - | -
> > > > 1 | 2 | - | 2
> > > > 1 | 3 | x | -
> > > > 1 | 4 | - | -
> > > > 1 | 5 | x | -
> > > > 2 | 6 | - | -
> > > > 2 | 7 | - | -
> > > > 3 | 8 | - | -
> > > >
> > > > This is the "join" meaning the hit's within group 1 would respond
> with
> > > the
> > > > first document id in the group, which is docid 2 (but take note of
> how
> > 3
> > > > and 5 were the documents that actually contained the hit.
> > > >
> > > > There have been many requests to change this behavior in 0.1 to
> > something
> > > > like, respond with 3,5 as the docids in the first hit.
> > > >
> > > > So I suppose we change the ScoreDoc object to contain a list of longs
> > for
> > > > the ScoreDoc to contain all of the document locations (docids) from
> the
> > > > group that were involved in creating the hit.
> > >
> > > I have too many docs per document group for that to be useful I think.
> > >  My scenario is.. "run 'docgroup' query(paged)"... displaying summary
> > > results... then, when a user asks for details, "snag the docs in a
> > > single 'docgroup' at at time".  I'd value a real primedoc - a place to
> > > store overview about the docgroup - more I think.  I wonder what the
> > > usage is for returning it all at once?
> > >
> > > --tim
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message