lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <>
Subject [jira] Commented: (SOLR-1516) DocumentList and Document QueryResponseWriter
Date Tue, 17 Nov 2009 05:17:39 GMT


Chris A. Mattmann commented on SOLR-1516:

bq. This does not help the user of the API much because the real difficulty is in unmarshalling
various types of objects. This patch does nothing to read the stored fields from the Document

I agree with your statement above regarding "the real difficulty". That's precisely what this
patch addresses. This patch deals with that real difficulty for users (of which there are
plenty, please see my comment above RE: use cases, e.g., FGDC, RDF, etc.) that are mostly
concerned with spitting out (for format compatibility) the resultant Documents from searches
in a particular XML format. This patch isn't intended to do anything with the stored fields
-- that's left up to the user who extends the abstract base classes by implementing #emitDoc
or #emitDocList, where the user deals with Lucene Documents. As I stated above numerous times,
it took me quite a bit of printing out and deducing the structure of the resultant SolrResponse
to determine where in that list Documents were stored (and in fact they weren't it i just
the IDs). This isn't really documented anywhere per se (at least from what I could find with
the online Javadocs or Wiki).

bq. That is really difficult. A lot of components write their output in a very arbitrary Object
tree. The output is largely designed like a JSON object tree (with more promitives) . The
producer decides what the tree contains. The good thing about this approach is that we don't
need to build custom classes for every type of output.

Why is this difficult? It would amount to components declaring what type of schema they return.
Typed, bags of objects, coupled with sparse documentation isn't exactly the answer. I think
we both agree that there is a larger issue to look at in terms of the SolrResponse though
and QueryResponseWriters, my point is that I don't think using this issue to solve those bigger
picture questions is the right answer. I'd be happy to create further issues to discuss this.

bq. There is no reason why a GenericResponseWriter can't do that . I am not happy about putting
this classes in and leading users to believe that this is all that they have to do.

How are we telling users that this is all they have to do? The patch specifically states (taken
from the included Javadoc):

bq. This {@link QueryResponseWriter} allows a user to implement the {@link #emitDoc(Document,
Writer)} function which acts as a callback function to process one Lucene {@link Document}
returned from the SOLR Query at a time. Sub-classes should keep track of any global state
as this class does not provide a means to access the entire set of returned {@link Document}s.If
that functionality is required, see {@link DocumentListResponseWriter}.

bq. This {@link QueryResponseWriter} allows a user to implement the {@link #emitDocList(List,
Writer)} function which acts as a callback function to process the entire {@link List} of
Lucene {@link Document} returned from the SOLR Query at once. To process the {@link Document}s
one-at-a-time (to conserve resources, or to speed up the processing/etc.), see {@link DocumentResponseWriter}.

I'm not sure I see the concern behind this ~250 line patch? The patch:

* adds functionality that would have simplified a number of use cases that I am leveraging
SOLR for in the space and earth science data community, where formats are critical and metadata
output is more important than the specific search meta-info (# hits, query time, start/end,
etc.). See the 3-4 examples I stated above.

* does not introduce anything that is not backwards compatible

* includes javadoc on all public methods, as well as class-level javadoc

* should apply without trouble to the current SVN trunk

This has typically been the criteria for inclusion (modulo unit tests, which if there is concern
there, I'd be happy to include) -- is the criteria different here in SOLR? 

> DocumentList and Document QueryResponseWriter
> ---------------------------------------------
>                 Key: SOLR-1516
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: My MacBook Pro laptop.
>            Reporter: Chris A. Mattmann
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>         Attachments: SOLR-1516.Mattmann.101809.patch.txt
> I tried to implement a custom QueryResponseWriter the other day and was amazed at the
level of unmarshalling and weeding through objects that was necessary just to format the output
o.a.l.Document list. As a user, I wanted to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate calculation
is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently generic
and reusable. The first is o.a.s.request.DocumentResponseWriter -- it handles the first bullet
above. The second is o.a.s.request.DocumentListResponseWriter. Both are abstract base classes
and require the user to implement either an #emitDoc function (in the case of bullet 1), or
an #emitDocList function (in the case of bullet 2). Both classes provide an #emitHeader and
#emitFooter function set that handles formatting and output before the Document list is processed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message