Pipelines
Pipeline
When we need a multistep data processing pipeline. One simple task is not enough. We can split the whole data processing in many simple tasks. All of them will work as one larger processing ... task. We just make it also a task, just task with type categoty pipeline. Simple task type categoty is normal.
This pipeline task gets input for the whole processing and processes output of the whole pipeline. The same about errors and progress updates. So we can track and manage it as a normal task.
Pipeline controller
We need to somehow connect all those simple tasks. So they are started at the right moment, and each task gets input from other tasks. Usually we also need some logic deciding what to do next in each case.
Pipeline controller does all that. It receives events from task controller and using a special API it
- decides what to do after each event.
- starts all tasks when it needed
- connect inputs and outputs of tasks.
- decides when the pipeline is finished.
So the idea is to separate pipeline orchestration logic from normal task logic doing real processing.