mdatapipe - multiprocess plugin-driven data pipeline engine¶
mdatapipe is a multiprocess plugin-driven data pipeline engine, it can be used for data collection, analysis and reporting.
mdatapipe allows users to create data processing pipelines using simple declarative configuration files. Each step in a pipeline is handled by a plugin, plugins are executed in isolated processes, steps can be parallelized to maximize throughput on multi-core systems.
mdatapipe is in a planning phase of development, currently the entire pipeline runs in a single system, but in the future it should possible to run “distributed” pipelines, using containers and cloud resources.
A data pipeline typically looks like this:
- collect datasource file:
path: ~/IIS_Log
- parse line ms_iis_log:
- transport using influxdb:
buffer_size: 1000
dbname: mydb
measurement: iis_log_time
tag_set: [s-sitename, s-computername,cs-uri-stem]
field_set: [time-taken]