incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: the feature formerly known as superquery
Date Sun, 27 Jan 2013 16:01:49 GMT
At this point highlighting (or whatever we want to call it) can be whatever we want it to be.
It might make sense to make it apart of the document retrieval to reduce round trips. 

Aaron 

Sent from my iPhone

On Jan 27, 2013, at 10:53 AM, Garrett Barton <garrett.barton@gmail.com> wrote:

> That might work. My biggest problem with the old row query was figuring out
> where the hit was in the row. If this gives me entry into the row and a way
> to access the hit information I would be satisfied.
> On Jan 27, 2013 9:37 AM, "Aaron McCurry" <amccurry@gmail.com> wrote:
> 
>> I was thinking that the search would return the document location of the
>> prime doc.  And a yet to be developed highlighting call would be
>> responsible for giving you the documents within group that that were hits.
>> What do you think?
>> 
>> Aaron
>> 
>> 
>> On Sun, Jan 27, 2013 at 9:29 AM, Garrett Barton <garrett.barton@gmail.com
>>> wrote:
>> 
>>> I like that idea better Aaron. I was going to say I wanted the ability to
>>> pic which Document was the prime so I could make that one a nice summary
>>> doc of the whole logical group.  Is there a way to be able to decide at
>>> query time to pull back the prime doc or the docs which were hit on
>> within
>>> the logical group?  Or maybe both?
>>> 
>>> 
>>> On Sun, Jan 27, 2013 at 9:19 AM, Aaron McCurry <amccurry@gmail.com>
>> wrote:
>>> 
>>>> Ok, after giving this some thought.  I will offer up a solution that I
>>> have
>>>> discussed with others before.  It's sorta in-between Lucene documents
>> and
>>>> what was the previous Blur API (Rows and Records).
>>>> 
>>>> If we simply added a list of subdocuments in the document object that
>>>> behaved exactly like regular documents.  So the struct would look like
>>>> 
>>>> Current:
>>>> 
>>>> struct Document {
>>>> list<Field> fields
>>>> }
>>>> 
>>>> Preposed:
>>>> 
>>>> struct SubDocument {
>>>> list<Field> fields
>>>> }
>>>> 
>>>> struct Document {
>>>> list<Field> fields,
>>>> list<SubDocument> subDocs
>>>> }
>>>> 
>>>> Obviously sub documents could be null and therefore a Document would
>>> behave
>>>> more or less like a Lucene document.  However if you added sub docs to
>> a
>>>> document you would have a behavior closer to the Rows and Records of
>>> 0.1.x,
>>>> but you would also have your prime document idea.  The Document struct
>>>> could be used as a standalone document in the group to store different
>>>> information, thus giving you your prime document behavior.
>>>> 
>>>> What do you think?
>>>> 
>>>> The next big discussion is how to represent joins in the query.  :-)
>>>> 
>>>> Aaron
>>>> 
>>>> 
>>>> On Sat, Jan 26, 2013 at 6:54 PM, Tim Williams <williamstw@gmail.com>
>>>> wrote:
>>>> 
>>>>> On Sat, Jan 26, 2013 at 1:35 PM, Aaron McCurry <amccurry@gmail.com>
>>>> wrote:
>>>>>> Well, this is a good topic. It will be possible, however there
>> hasn't
>>>>> been
>>>>>> any formal implementation yet.
>>>>>> 
>>>>>> Here are some thoughts from an API perspective.
>>>>>> 
>>>>>> Currently a query is provided and results are returned as a list
of
>>>>>> TopFieldDocs  Then within each TopFieldDoc there is a list of
>>> ScoreDocs
>>>>>> within each ScoreDoc there is a single document location
>> represented
>>>> by a
>>>>>> long.
>>>>>> 
>>>>>> In the past when performing a SuperQuery or Join in Lucene terms,
>>> Blur
>>>>>> would actually respond with a single document location (docid) from
>>> the
>>>>>> group of documents.  It was always the first document in the
>> grouping
>>>> of
>>>>>> documents.
>>>>>> 
>>>>>> Example:
>>>>>> 
>>>>>> logical grouping | docid | hit | responding document id hit
>>>>>> 
>>>>>> 0 | 0 | - | -
>>>>>> 0 | 1 | - | -
>>>>>> 1 | 2 | - | 2
>>>>>> 1 | 3 | x | -
>>>>>> 1 | 4 | - | -
>>>>>> 1 | 5 | x | -
>>>>>> 2 | 6 | - | -
>>>>>> 2 | 7 | - | -
>>>>>> 3 | 8 | - | -
>>>>>> 
>>>>>> This is the "join" meaning the hit's within group 1 would respond
>>> with
>>>>> the
>>>>>> first document id in the group, which is docid 2 (but take note of
>>> how
>>>> 3
>>>>>> and 5 were the documents that actually contained the hit.
>>>>>> 
>>>>>> There have been many requests to change this behavior in 0.1 to
>>>> something
>>>>>> like, respond with 3,5 as the docids in the first hit.
>>>>>> 
>>>>>> So I suppose we change the ScoreDoc object to contain a list of
>> longs
>>>> for
>>>>>> the ScoreDoc to contain all of the document locations (docids) from
>>> the
>>>>>> group that were involved in creating the hit.
>>>>> 
>>>>> I have too many docs per document group for that to be useful I
>> think.
>>>>> My scenario is.. "run 'docgroup' query(paged)"... displaying summary
>>>>> results... then, when a user asks for details, "snag the docs in a
>>>>> single 'docgroup' at at time".  I'd value a real primedoc - a place
>> to
>>>>> store overview about the docgroup - more I think.  I wonder what the
>>>>> usage is for returning it all at once?
>>>>> 
>>>>> --tim
>>>>> 
>>>> 
>>> 
>> 

Mime
View raw message