File Chunk Source is a source component aimed at efficiently reading really large files in chunks. It extends the functionality of a File Source component but adds additional features to deal with chunking.



The only input expected is a base directory, that will be polled depending on the runtime policies that apply to the component. Usually, this source component triggers the execution of the pipeline and is configured to run scheduled or polling every number of milliseconds. The directory should have really large files to read, otherwise use File Source.

When the File Source component is activated, it polls the base directory for files. Then, the list of files is filtered by applying different criteria. For each file in the list, the contents are read as binary input and processed using the selected deserializer that will create the output record. Finally, advanced processing options might affect the behavior of this component, for example by renaming the processing file with a suffix or removed.

Once a file is picked up from the source directory, the deserializer determine how the file content is managed.


Common properties that apply to all source and processor components of a pipeline.






IdentifierIDComponent unique Identifier within the pipeline, read only, only useful for advanced mode.YesStringAuto


This Identifier is automatically generated by the system and you can't change it. Might be helpful for advanced pipeline configuration.

DescriptiondescriptionA short description for the component, to be visualized inside the component representation, aimed at providing additional information to understand the pipeline at a glance.NoString

Extract customer id and loyalty number.

Short and sweet description.

TopictopicAll Source and Processor components support a topic to tag the output records. If the record does not have a tag applied, this topic will be automatically applied. The topic may be used by future components to route, group or classify records.NoString


All output records will be tagged using "foo", unless they have not been tagged during the execution of the step.

All properties defined for File Source are available.

Base directorydirectory-baseBase directory to read files from.YesPath
DeserializerdeserializerDeserializer to process input.YesSee SerializersBinary Deserializer
File minimum agefile-age

Minimum file age for files to be retrieved.NoDuration5ms
File patternfile-pattern

A regular expressing pattern to match selected files. It accepts Unix style regular expressions.NoUnix File Pattern*
File sortingfile-sort

Sorting criteria. Decide the ordering to pick up files, either by name or age.No


NAME: Alphabetical order, A to Z, A first.

AGE: Age order, older files first.

File minimum sizemin-lengthThe minimum size for files to be retrieved.No


File maximum sizemax-lengthThe maximum size for files to be retrieved.No


File suffix on successfile-suffix-on-success

The file suffix that will be appended once the file is processed with any error. Suffix can be used to avoid the file selected anymore, so you must take care to wipe the directory off eventually. If not indicated, the file is removed after processing.



File suffix on errorfile-suffix-on-errorThe file suffix that will be appended once the file is processed with any error. Suffix can be used to avoid the file selected anymore, so you must take care to wipe the directory off eventually. If not indicated, the file is removed after processing.No


Delete directory if emptydirectory-delete-emptyRemove the directory once it is empty (all files have been processed). Please note that sufficient privileges are required.NoBooleanfalse
Multiple output recordsbreak-listsA deserializer returns a list of deserialized objects; by default, all objects will be wrapped in just one output record. If you enable generate multiple records, each deserialized item will be wrapped in its own record. This is a handy behavior to allow parallel processing.NoBooleanfalse
Read in chunksread-in-chunksEnable to allow reading files in chunks and avoid memory issues.NoBooleanfalse
Keep separatorrecord-separator

Separator to identify where one record in the file ends and where the next one begin. Required if chunking enabled.

Separator charsetseparator-charsetThe charset used for the separatorNoCharsetUTF-8
Chunk read sizechunk-read-sizeSize in bytes for the read chunk from the filesNoCapacity100MB
Records sizerecord-sizeEstimated record size in bytes. Used for a better memory usageNoCapacity1KB
Keep separatorkeep-separatorKeeps the separator in the parsed recordsNoBoolean