The EXTRACTTEXT workflow application extracts text content from an input file (.pdf, .docx, or .txt) and returns the extracted text and its length. It supports optional parameters for maximum file size, trimming, and text normalization (Unix-style line breaks).
Note: EXTRACTTEXT is available as of WorkflowGen version 10.0.0 (v10 Preview 1).
Required parameters
| Parameter | Type | Direction | Description |
|---|---|---|---|
FILE |
FILE | IN | The file from which to extract the text (must be .pdf, .docx, or .txt) |
TEXT |
TEXT | OUT | The extracted (and possibly normalized/trimmed) text |
LENGTH |
NUMERIC | OUT | The length (number of characters) of the extracted text |
Optional parameters
| Parameter | Type | Direction | Description |
|---|---|---|---|
MAX_FILE_SIZE |
NUMERIC | IN | Maximum allowed file size in MB |
TRIM_SIZE |
NUMERIC | IN | Maximum number of characters to keep from the extracted text |
NORMALIZE |
TEXT | IN | Whether to normalize line endingsPossible values: Y, N, true, false |