File Source is a source component aimed at picking up files from a directory to be processed in the pipeline. 



Usually, the directory is a staging directory where files are placed to be processed using different strategies. The File Source component supports batching, sorting, file selection based on regular expression or file age, and file post-processing actions such as rename or remove.  The File Source component is very handy to link different pipelines or software systems through a standard file system.

The only input expected is a base directory, that will be polled depending on the runtime policies that apply to the component. Usually, this source component triggers the execution of the pipeline and is configured to run scheduled or polling every number of milliseconds.

When the File Source component is activated, it polls the base directory for files. Then, the list of files is filtered by applying different criteria. For each file in the list, the contents are read as binary input and processed using the selected deserializer that will create the output record. Finally, advanced processing options might affect the behavior of this component, for example by renaming the processing file with a suffix or removed.

Once a file is picked up from the source directory, the deserializer determines how the file content is managed.


Common properties that apply to all source and processor components of a pipeline.






IdentifierIDComponent unique Identifier within the pipeline, read only, only useful for advanced mode.YesStringAuto


This Identifier is automatically generated by the system and you can't change it. Might be helpful for advanced pipeline configuration.

DescriptiondescriptionA short description for the component, to be visualized inside the component representation, aimed at providing additional information to understand the pipeline at a glance.NoString

Extract customer id and loyalty number.

Short and sweet description.

TopictopicAll Source and Processor components support a topic to tag the output records. If the record does not have a tag applied, this topic will be automatically applied. The topic may be used by future components to route, group or classify records.NoString


All output records will be tagged using "foo", unless they have not been tagged during the execution of the step.

Base directorydirectory-baseBase directory to read files from.YesPath
DeserializerdeserializerDeserializer to process input.NoSee Deserializers
File minimum agefile-age

Minimum file age for files to be retrieved.NoDuration5ms
File patternfile-pattern

A regular expressing pattern to match selected files. It accepts Unix style regular expressions.NoUnix File Pattern*
File sortingfile-sort

Sorting criteria. Decide the ordering to pick up files, either by name or age.No


NAME: Alphabetical order, A to Z, A first.

AGE: Age order, older files first.

File minimum sizemin-lengthThe minimum size for files to be retrieved.No


File maximum sizemax-lengthThe maximum size for files to be retrieved.No


File suffix on successfile-suffix-on-success

The file suffix that will be appended once the file is processed without any error. Suffix can be used to avoid the file being selected , so you must take care to wipe the directory off eventually. If not indicated, the file is removed after processing.



File suffix on errorfile-suffix-on-errorThe file suffix that will be appended once the file is processed with any error. Suffix can be used to avoid the file being selected , so you must take care to wipe the directory off eventually. If not indicated, the file is removed after processing.No


Delete directory if emptydirectory-delete-emptyRemove the directory once it is empty (all files have been processed). Please note that sufficient privileges are required.NoBooleanfalse
Multiple output recordsbreak-listsIf deserializer result is a list of elements, break the result in multiple records (one record per each list element)NoBooleantrue
Just readjust-read

Just read files. Do not rename or delete after read. If other steps do not rename or delete them, they will be reprocessed again.

Hint: Sleep time ddc scheduling policies for this step should be much bigger than processor or sink that mark files as processed.



Base directory


This directory is usually used to read PCAP files from.

File minimum age


Only pick up files that have not been  accessed in the last 10 minutes.

File pattern


Select files with .foo suffix.

File sorting


Given a list of files (foo, bar, zid), the sorting by name sorts as (bar, foo, zid).

File minimum size


Only select files that are 5 kilobytes (included) or bigger.

File maximum size


Only select files that are 2.5 megabytes (included) or smaller.

File suffix on success


Append the .ok suffix to the file once it's processed successfully.

File suffix on error


Append the .error suffix to the file once it's processed and yields any error.

Delete directory if empty


Empty child directories will be removed from directory-base when polling.

Multiple output records


If you have processed a CSV file with multiple lines, one output record will be generated for each CSV line; otherwise, all CSV lines will be deserialized and wrapped inside one output record.