lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1516) DocumentList and Document QueryResponseWriter
Date Wed, 28 Oct 2009 02:27:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770756#action_12770756
] 

Chris A. Mattmann edited comment on SOLR-1516 at 10/28/09 2:27 AM:
-------------------------------------------------------------------

I haven't really heard any comments on this issue, and I've got the impression that not many
folks write these QueryResponseWriters. To me, writing one was invaluable. The use case was:

* I make the choice to make SOLR the gold source for search index data (I'm dealing with planetary
science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a search web service,
with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I have some specialized
earth and planetary science use cases. E.g., on a few different projects, I need to be able
to:
   * output FGDC XML (yes it's a standard for earth science metadata, and also relevant for
the GeoSOLR stuff)
   * output custom RDF metadata 
   * output a particular style of JSON to plug in to some external web client, e.g., an auto-suggest
that requires its own JSON format, not SOLR's

  To illustrate the reason that the 5-7 output formats didn't do it for me either, I'll use
an example. There may be the sense of, "well why didn't I write some Java/Ruby/PHP/Python
client that called SOLR and one of it's existing wt's and then output a custom format from
your favorite programming language (PL)"? The reasons are three fold:

  1. SOLR advertises that the QueryResponseWriter interface is an official SOLR plugin and
interface, at least according to:
      * the Wiki documentation [1]
      * the advertised published book on SOLR [2]
      * Chris Hostetter's ApacheCon08 slides as part of the core SOLR architecture in his
50K foot view diagram [3]

2. If SOLR is truly a search web service, and allows for changeable output formats (evidenced
by exposing the wt parameter), then why force people to use one of the existing wt's and then
ask them to transform (either via a PL, or via XSLT) instead of allowing them to natively
generate the specific output format type?

3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class if it is never
intended to be implemented by others, or more importantly, is kind of non-intuitive to implement?

Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed and that expect
data to be loaded into them in a particular format, and I'd like to plug them into SOLR and
the easiest way for me to do that is with a curl/wget type operation and then a pipe into
the COTS/OTS tool, and wt's are the way to go for that.

So, given the above, when I went to write a "wt" I was surprised how hard it was for me to
understand the NamedList structure which is just a bag of objects that you have to unpack
with unfriendly instanceof checks and recursive unmarshalling (walking the NamedList tree).
All I wanted for my wt was to be able to format the output Document List or on a Doc-by-doc
basis. 

Anyways just wanted to provide some further fodder and discussion for this issue. To me this
is important, and clearly, based on [1-3], QueryResponseWriters by definition seem to be a
big piece of the SOLR architecture.


Chris

---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2] http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf 
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.



      was (Author: chrismattmann):
    I haven't really heard any comments on this issue, and I've got the impression that not
many folks write these QueryResponseWriters. To me, writing one was invaluable. The use case
was:

* I make the choice to make SOLR the gold source for search index data (I'm dealing with planetary
science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a search web service,
with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I have some specialized
earth and planetary science use cases. E.g., on a few different projects, I need to be able
to:
   * output FGDC XML (yes it's a standard for earth science metadata, and also relevant for
the GeoSOLR stuff)
   * output custom RDF metadata 
   * output a particular style of JSON to plug in to some external web client, e.g., an auto-suggest
that requires its own JSON format, not SOLR's

  To illustrate the reason that the 5-7 output formats didn't do it for me either, I'll use
an example. There may be the sense of, "well why didn't I write some Java/Ruby/PHP/Python
client that called SOLR and one of it's existing wt's and then output a custom format from
your favorite programming language (PL)"? The reasons are three fold:

  1. SOLR advertises that the QueryResponseWriter interface is an official SOLR plugin and
interface, at least according to:
      * the Wiki documentation [1]
      * the advertised published book on SOLR [2]
      * Chris Hostetter's ApacheCon08 slides as part of the core SOLR architecture in his
50K foot view diagram [3]
  2. If SOLR is truly a search web service, and allows for changeable output formats (evidenced
by exposing the wt parameter), then why force people to use one of the existing wt's and then
ask them to transform (either via a PL, or via XSLT) instead of allowing them to natively
generate the specific output format type?
  3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class if it is never
intended to be implemented by others, or more importantly, is kind of non-intuitive to implement?

Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed and that expect
data to be loaded into them in a particular format, and I'd like to plug them into SOLR and
the easiest way for me to do that is with a curl/wget type operation and then a pipe into
the COTS/OTS tool, and wt's are the way to go for that.

So, given the above, when I went to write a "wt" I was surprised how hard it was for me to
understand the NamedList structure which is just a bag of objects that you have to unpack
with unfriendly instanceof checks and recursive unmarshalling (walking the NamedList tree).
All I wanted for my wt was to be able to format the output Document List or on a Doc-by-doc
basis. 

Anyways just wanted to provide some further fodder and discussion for this issue. To me this
is important, and clearly, based on [1-3], QueryResponseWriters by definition seem to be a
big piece of the SOLR architecture.


Chris

---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2] http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf 
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.


  
> DocumentList and Document QueryResponseWriter
> ---------------------------------------------
>
>                 Key: SOLR-1516
>                 URL: https://issues.apache.org/jira/browse/SOLR-1516
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: My MacBook Pro laptop.
>            Reporter: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1516.Mattmann.101809.patch.txt
>
>
> I tried to implement a custom QueryResponseWriter the other day and was amazed at the
level of unmarshalling and weeding through objects that was necessary just to format the output
o.a.l.Document list. As a user, I wanted to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate calculation
is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently generic
and reusable. The first is o.a.s.request.DocumentResponseWriter -- it handles the first bullet
above. The second is o.a.s.request.DocumentListResponseWriter. Both are abstract base classes
and require the user to implement either an #emitDoc function (in the case of bullet 1), or
an #emitDocList function (in the case of bullet 2). Both classes provide an #emitHeader and
#emitFooter function set that handles formatting and output before the Document list is processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message