Package scriptella.driver.text

Text Driver for Scriptella.

See: Description

Package scriptella.driver.text Description

Text Driver for Scriptella.

It allows querying a text file based on regular expressions, the text driver can also be used as a lightweight replacement for Velocity to produce simple output with properties substitution.

Text driver does depends on additional libraries and is generally faster than CSV or Velocity driver.

Note: The driver doesn't use SQL syntax

General information

Driver class: scriptella.driver.text.Driver
URL: Text file URL. URIs are resolved relative to a script file directory. If url has no value the output is read from/printed to the console (System.out).
Runtime dependencies: None

Driver Specific Properties

Name Description Required
encoding Specifies charset encoding of Text files. No, the system default encoding is used.
eol End-Of-Line suffix.

Only valid for <script> elements.

No, the default value is \n.
trim Value of true specifies that the leading and trailing whitespaces in text file lines should be omitted. No, the default value is true.
flush Value of true specifies that the outputted content should flushed immediately when the <script> element completes. No, the default value is false.
skip_lines The number of lines to skip before start reading. No, the default value is 0 (no lines are skipped).
null_string Specifies string token to represent Java null literal.

When querying a text file, regex group equal to null_string is returned as Java null.
When outputting content, if null_string is specified, all the missing variables, or the vars with a null value are substituted with null_string.

Specify an empty string (null_string=) to automatically convert between nulls in memory and empty strings in files. For example: Query regex: \d*,\d*,\d*, input line 1,,5 is parsed into a set of 3 variables with the following values {"1", null, "5"} as opposed to the default behaviour {"1","","5"}.

No, by default strings are preserved, i.e. empty strings are not converted to nulls and null variables references are not expanded in the output, i.e. ${nullvalue}.

Query Syntax

Text driver supports Regular expressions syntax to query text files. The file is read line-by-line from the location specified by the URL connection property and each line is matched against the regex pattern.

If a line or a part of it matches the pattern this match produces a virtual row in a result set. The column names in a virtual result set correspond to matched regex group names. For example query foo(.*) matches foobar line and the produced result set row contains two columns(groups): 0-foobar, 1-bar. These columns can be referenced in child script or query elements by a numeric name or by a string name columnN.

It also possible to specify more than one regular expressions to match file content. Specify each regular expression on a separate line to match them using OR condition.

The Text driver uses java.util.regex implementation for pattern matching. See java.util.Pattern for supported syntax Javadoc.

Additional notes:


Example:
<query>
  ^ERROR: (.*)
  WARNING: (.*Failed.*)
  ([\d]+) errors?
</query>
    
This query consists of 3 regular expressions:
  1. selects lines starting with ERROR: prefix
  2. selects WARNING lines having Failed substring
  3. selects lines containg a number of errors, e.g. "Found 5 errors".
The query selects any line satisfying one of these 3 regular expressions. Suppose input file has the following content:
Log file started...
INFO: INIT
WARNING: CPU is slow
WARNING: Failed to increase heap size
ERROR: Process interrupted
Operation completed with 1 error.
As the result of query execution the following set of rows is produced:
0 1
WARNING: Failed to increase heap size Failed to increase heap size
ERROR: Process interrupted Process interrupted
1 error 1

Script Syntax

The <script> element content is read line-by-line, for each line properties are expanded and the output is sent to the file specifed by a url connection attribute.

Additional notes:


Example:
<script>
    Inserted a record with ID=$id. Table=${table}
</script>
    
For id=1 and table=system this script produces the following output:
Inserted a record with ID=1. Table=system
    

Properties substitution

In text script and query elements ${property} or $property syntax is used for properties/variables substitution.

NOTE:

By default NULL variables and expressions are preserved, use null_string connection property to specify a string token for nulls. For example setting null_string to empty string in the connection properties section will enable parsing empty strings as nulls:
<connection driver="csv" url="report.csv">
    null_string=
    </connection>
Scriptella properties substitution engine cannot distinguish null value from unused variable or some random usage of $var syntax, therefore we've chosen to preserve these blocks until user explicitly specify the value of null_string.

Examples

<connection id="in" driver="text" url="data.csv">
</connection>
<connection id="out" driver="text" url="report.csv">
</connection>

<script connection-id="out">
    ID;Priority;Summary;Status
</script>

<query connection-id="in">
    <script connection-id="out">
        $rownum;$column0;$column1;$column2
    </script>
</query>

Copies rows from data.csv file to report.csv, additionally the ID column is added. The result file is semicolon separated.

Declarative formatting/parsing rules for property substitution

Starting from version 1.1 Scriptella supports configurable rules for formatting and parsing of property values. The rules are described in a form of connection parameters prefixed with a "format." string.
Example of defining a format for a numeric property with 2 digits after a decimal point:
format.someColumn.type=number
format.someColumn.pattern=000.00

Each property has the following formatting/parsing options (see PropertyFormat class for implementation details):

Default values for formatting and parsing rules

It is possible to provide default values for most of the properties: Currency example demonstrates usage of formatting rules.

Copyright © Copyright 2006-2012 The Scriptella Project Team.