mdatapipe - multiprocess plugin-driven data pipeline engine

mdatapipe is a multiprocess plugin-driven data pipeline engine, it can be used for data collection, analysis and reporting.

mdatapipe allows users to create data processing pipelines using simple declarative configuration files. Each step in a pipeline is handled by a plugin, plugins are executed in isolated processes, steps can be parallelized to maximize throughput on multi-core systems.

mdatapipe is in a planning phase of development, currently the entire pipeline runs in a single system, but in the future it should possible to run “distributed” pipelines, using containers and cloud resources.

A data pipeline typically looks like this:

- collect datasource file:
    path: ~/IIS_Log

- parse line ms_iis_log:

- transport using influxdb:
            buffer_size: 1000
            dbname: mydb
            measurement: iis_log_time
            tag_set: [s-sitename, s-computername,cs-uri-stem]
            field_set: [time-taken]