lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Will Johnson" <wjohn...@GETCONNECTED.COM>
Subject RE: search components (plugins)
Date Mon, 11 Jun 2007 13:20:03 GMT
Some thoughts:

One of the most powerful and useful concepts that many of the other
engines (well the good ones) use is the notion of processing pipelines.

For queries this means a series of stages that do things such as:

* faceting
* collapsing
* applying default values
* spell checking
* adding in promotions/boosted content
* applying relevancy logic
* more like this

But it is also heavily used at indexing time.  The more complex engines
use these pipelines for all kinds of crazy stuff like converting
msoffice docs, ocr, speech to text, etc which I think is what nutch does
to some extent.  However solr could still use the same notion to do more
lower level operations like:

* applying synonyms
* removing/renaming fields
* translating xml formats (it would be nice to have any update handler
be able to apply an xslt on incoming data)
* validate incoming data against some business logic

I think much of this is wrapped up in the field definitions at the
moment but it could be extended to be more document aware.

Anything that makes chaining of pre-built processing easier would be
nice.  In addition, if these stages are specified in solrconfig then
decisions like 'do I want faceting before or after collpasing' become
simple cut/paste choices not code changes.

Further, if the last processing step is 'index this doc' or 'search the
index' those should be easy to replace with 'send this doc to segment x'
or 'search all the sub indexes' with simple xml config file changes
assuming those stages exist. (which again is how many of the other
engines do things)  

- will



-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Sunday, June 10, 2007 12:51 PM
To: solr-dev@lucene.apache.org
Subject: search components (plugins)

Some people have needed some custom query logic, and they had to
implement their own request handlers.  They still wanted all of the
other functionality (or almost all), so they are forced to copy the
standard request handler or dismax, or both. That's not the easiest to
maintain, and could be more elegant.

Another layer of plugins sounded like overkill at first, but I'm
starting to rethink it, esp in the face of the expanding number of
different variations:
  - standard
  - dismax
  - more-like-this
  - field collapsing

Seems like we should be able to more easily mix and match, or add new
pieces, w/o having whole new request handlers.

Looking toward the future, and distributed search, this might be a
natural place to add hooks to implement that distributed logic.  This
would allow other people to efficiently support their custom
functionality in a distributed environment.

Thoughts?

-Yonik

Mime
View raw message