commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Nuttycombe <Kris.Nuttyco...@noaa.gov>
Subject Re: commons-pipeline status?
Date Mon, 07 Aug 2006 17:53:25 GMT
Hi, Steve,

I'm CC'ing this email to the commons user list so that if anyone else 
has similar questions, they can benefit as well.

Steve Christensen wrote:

>Hi Kris,
>
>Hope you are doing well. I've been looking at commons-pipeline over the
>weekend, and it looks very close to what we'd been thinking of in our
>high-level designs. Thank you very much for making it public.
>
>I've got a couple quick questions:
>
>1) Some of the stages in org...pipeline.stage have
>ConsumedTypes / ProducedTypes annotations, but not all of them. Some of
>the ones without annotations seem like they wouldn't need them (LogStage
>and RaiseEventStage), but some seem like they're missing
>(URLtoInputStreamStage and InputStreamLineBreakStage)
>  
>
The stages which are missing annotations are simply ones that I haven't 
gotten around to annotating yet. Also, the unit tests to ensure that the 
validation components are working correctly have yet to be written, but 
it is definitely on the near-term todo list to get all of the validation 
pieces set up.

>2) It looks like the pipeline holds information about branches but it's
>up to the Stage implementation to route things through the branch, is
>that correct? That is, if we want a pipeline with branches, we should
>have some sort of RouteStage that identifies the objects being fed to
>it, and calls emit(branch-key,object) to feed the  object to the correct
>branch
>  
>
That is correct. Usually most Stage implementations just wrap business 
logic from other classes, so in practice I will frequently combine some 
sort of initial processing with the routing into a single stage, but 
having a stage that simply works as a router would work fine as well. 
The one straight "router" that I've done used Commons-Chain for making 
routing decisions and I found it pretty simple to work with.

>3) Also w/ regard to pipelines/branches, is there a mechanism to merge
>the results of a branch back in to the main pipeline? 
>
>That is, we might have a pipeline that downloads files, identifies files
>by extension and routed them to the correct pipeline branch. Once all
>data has passed through all branches, there would be a stage that
>collected all the transformed output into a package for distribution to
>our customer-facing system.
>
>		       +--> PDF processing --------------+
>                      /                                   \
>Download --> Route --+--> Convert .DAT --+                 \
>  Files      Files    \     To XML       |                  \
>                       \                 |                   \
>                        \                |                    \
>                         +---------------+--> Convert XML -----+--> Merge Results
>                                              To Standard           into output 
>                                               XML                     package
>
>  
>
Yes and no; the way that this could be implemented using the current 
design would be to have a merge stage that would be registered as a 
StageEventListener, and to use events to pass the objects from other 
branches back to the main branch. I haven't thought much about how to do 
a genuine merge of multiple branches, but it seems like it would be easy 
to write a Stage implementation that used the Feeder from a specific 
StageDriver on your main branch. Configuring this setup in code would be 
straightforward; I'm not sure how one would do it using the Digester 
configuration setup.

Hope this helps!

Kris

>
>  
>
>>Cool! Looks like you guys have been busy. 
>>
>>I think the single FAQ, and the page describing configuration, are what
>>I needed to push me in the right direction to start playing with things.
>>I'll let you know when I've got questions.
>>
>>Thanks,
>>Steve
>>
>>    
>>
>>>Here is the most current source distribution. Our group has a 
>>>clandestine copy of the project website with updated documentation at 
>>>http://gdsg.ngdc.noaa.gov/projects/commons-pipeline that will hopefully 
>>>go away if the patches get committed. Due to a Maven bug, the JavaDoc 
>>>link doesn't work properly but 
>>>http://gdsg.ngdc.noaa.gov/projects/commons-pipeline/apidocs/index-all.html 
>>>should have the updated javadocs.
>>>
>>>As usual on these projects, the documentation is a little thin but if 
>>>you have any questions about how to proceed, let me know! If you want to 
>>>set up a pipeline with a Digester configuration file, a simple example 
>>>is available in the test code in the file src/test/resources/test_conf.xml.
>>>
>>>Kris
>>>
>>>Steve Christensen wrote:
>>>
>>>      
>>>
>>>>Hi Kris,
>>>>
>>>>It's too bad that things are in limbo at the moment. I'd love to get a
>>>>look at the latest code. 
>>>>
>>>>Also, is there a mailing list or homepage/wiki for the project?
>>>>Specifically, I'm looking for a tutorial or set of examples that I could
>>>>use to put together a quick proof-of-concept for our architect. I'm
>>>>slowly going through the Javadoc and JUnit tests, but its slow going.
>>>>
>>>>Thanks,
>>>>Steve
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>What happens next will depend upon whether or not a committer is willing

>>>>>to take on and mentor the project. I have submitted a patch set to JIRA

>>>>>that can be used to bring the code base up to date with respect to 
>>>>>recent development that's been done, but if you want to take a look at

>>>>>the code sooner than that I'd be happy to just email you a source 
>>>>>distribution to get you started.
>>>>>
>>>>>Thanks for your interest!
>>>>>
>>>>>Kris
>>>>>
>>>>>Steve Christensen wrote:
>>>>>
>>>>>  
>>>>>
>>>>>          
>>>>>
>>>>>>Hi Kris,
>>>>>>
>>>>>>I'm interested in commons-pipeline. I work for a content agregator
-- we
>>>>>>do online distribution of medical journals/books/bibliographies.
>>>>>>
>>>>>>I think commons-pipeline could be a good fit for the backend of our
>>>>>>workflow system. We get data in many different formats, translate
some
>>>>>>to XML, transform the XML to a standard form, then transform the
>>>>>>standard form to a couple different web-platform-specific formats.
>>>>>>
>>>>>>It doesn't seem like there's been much activity in the Sandbox since
>>>>>>last year. Has commons-pipeline moved to a new location? I see from
the
>>>>>>mailing list that moving it to Incubator was discussed.
>>>>>>
>>>>>>Thanks,
>>>>>>Steve 
>>>>>>
>>>>>>
>>>>>>
>>>>>>    
>>>>>>
>>>>>>            
>>>>>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message