lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DocumentProcessing" by JanHoydahl
Date Mon, 18 Apr 2011 18:43:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DocumentProcessing" page has been changed by JanHoydahl.
The comment on this change is: Clarification.
http://wiki.apache.org/solr/DocumentProcessing?action=diff&rev1=12&rev2=13

--------------------------------------------------

  
  = Anti-patterns =
   * Do not over-architecture like Eclipse SMILA and others have done going crazy with ESB
etc
+  * Do not try to be a connector framework as well. Let ManifoldCF do that job. Focuson on
the pipeline!
+  * Do not keep the source private (although Apache licensed) as DieselPoint did with OpenPipeline
- create a community!
  
  = Proposed architecture =
  [[https://docs.google.com/drawings/edit?id=1rVsy-p7sexSw3wrald2_fHtkLk6opYp5qzllvOHOB8c&hl=en|Architecture
diagram]]
@@ -66, +68 @@

  Glue code to hook the pipeline into Solr could be an UpdateRequestProcessor which can either
work in "local" mode, executing the pipeline locally in-thread, or in "distributed" mode which
would dispatch the batch to an available node in a document processing cluster.
  
  I envision that the whole pipeline could (in addition to running standalone) be wrapped
in a Solr RequestHandler i.e. a Document-processing-only node would be an instance of Solr
with a new BinaryDocumentRequestHandler, without a local index. When processing is finished,
the documents are routed to the final destination for indexing (perhpas using [[https://issues.apache.org/jira/browse/SOLR-2358|SOLR-2358]]).
+ 
+ The architecture diagram above shows the local and the fully distributed cases. Another
option would be to round-robin feeding to the set of pipeline nodes directly (not needing
a BinaryDocumentRequestHandler), and letting them do the distributed indexing as the last
UdateProcessor.
  
  = Risks =
   * Automated distributed indexing [[https://issues.apache.org/jira/browse/SOLR-2358|SOLR-2358]]
needs to work with this

Mime
View raw message