[FIXtail project at SourceForge.net] For more information, or to download FIXtail, see FIXtail's project page on SourceForge.

FIXtail

FIXtail is a modified version of GNU tail, with extra features targeted primarily (though not exclusively) at bandwidth and latency analysis of Financial Information eXchange (FIX) log files. In addition to all the usual features of GNU tail, like following multiple files by name or file descriptor, FIXtail supports filtering by timestamp, by certain FIX field values or by regular expression, and can generate periodic reports of bytes, bits, or lines per second over specified time intervals.

Filtering by timestamp

The --start, --end and last options allow you to specify a range of lines for display or sampling by timestamp:

--start=[yyyymmdd-]HH:MM:SS, --end=[yyyymmdd-]HH:MM:SS

--start and --end specify the range as absolute GMT start (inclusive) resp. end (exclusive) times. The date may be omitted, and defaults to the current GMT date.

--last=NUM{s,m,h}

--last specifies the start time as a certain number of seconds, minutes or hours before the current time, in the format 2h, 10m or 30s. Mixed intervals are not currently supported, so an interval of, for instance, an hour and thirty minutes must be specified as 90m.

When examining an existing file (as opposed to input from a pipe or socket), FIXtail will find its starting point by performing a binary search through the file for the first line with a timestamp matching the provided --start or --last specification. For large files (as FIX log files are expected to be), this can be much less expensive than a linear scan.

[NOTE: All of this, of course, assumes that all lines in the file have timestamps, and that lines are in non-decreasing timestamp order. If either of these conditions does not hold, FIXtail may become confused in its binary search.]

Timestamp formats

For all of this to work, of course, FIXtail must be able to determine the timestamp of a line. FIXtail uses strptime(3) to parse the timestamp from a line; the input time format for this purpose is controlled by the following options:

--iso[=C]

The --iso option specifies that the line's timestamp is at the start of line in ISO8601 time format, equivalent to the strptime(3) format '%Y-%m-%dT%H:%M:%S'. This is the default if no other time-format-related options are specified. The date/time separator character can be specified as an optional argument to the --iso option; the default is 'T'. For instance, to match the timestamp '2008-05-05 12:34:56', you would specify --iso=' '.

--fix

In addition to its other effects, the --fix option specifies that a FIX message's SendingTime field (52) should be used as its timestamp.

--compact

The --compact option specifies that the line's timestamp is at the start of line in yyyymmddHHMMSS format, equivalent to the strptime(3) format '%Y%m%d%H%M%S'.

--input-timefmt=FMT

The -input-timefmt option allows you to specify your own strptime(3) format for a line's timestamp, which is presumed to be at the start of the line.

--local-timestamp

The --local-timestamp option specifies that the line's timestamp, and the arguments to the --start and --end options, if any, should be interpreted according to the prevailing local timezone. If this option is not given, GMT will be used for both.

[NOTE: If the line's timestamp does not include year information (notable examples of this include some of the more interesting log files in /var/log on Linux systems), strptime(3) may be unable to correctly determine whether Daylight Saving time is in effect, and so may parse timestamps incorrectly. FIXtail attempts to do the right thing for timestamps near the current time.]

All timestamps are presumed to be in GMT, unless the --local-timestamp option is specified.

FIX-related options

The --fix option specifies that input files are in FIX format (hence the name). In addition to controlling timestamp parsing as described above, --fix enables other options:

--sender=ID, --target=ID

The --sender and --target options allow you to filter output by the specified SenderCompID (FIX 49=) and TargetCompID (FIX 56=) values, respectively.

--fix-ifs=C

--fix-ifs specifies the separator character used between FIX field in the input files. This influences field parsing for the --sender and --target options and for timestamp parsing. The default value is the standard ASCII 001 (SOH). Alternate values can be specified as a single character, an octal escape \OOO, or a control character specified as '^' character followed by a letter; for instance, the options --fix-ifs=\001 and --fix-ifs=^A are both equivalent to the default.

--fix-ofs=C

--fix-ofs specifies the separator character used between FIX fields in the output. This defaults to a space character, for easier human consumption; alternative values can be specified in the formats recognized by --fix-ifs. If the output is being fed into another program expecting FIX-format input, for instance, you will probably need to specify --fix-ofs=^A.

Sampling

FIXtail's raison d'être is its sampling facilities. Instead of outputting the contents of its input file(s), FIXtail can, at specified time intervals, report bits, bytes, and lines/messages per second, historically (for existing log files) and in real time (via tail's -f/-F options). This functionality is controlled by the following options:

--sample=NUM{s,m,h}

--sample specifies the sample interval (hence the name), as a number of seconds, minutes or hours, in the same format as the --last option. The input will be divided by timestamp into samples of this length, and an informational line will be output for each sample, according to the --output-format setting.

--output-format=FMT (or -o FMT)

--output-format specifies the contents of the information line printed for each sample. The provided string can contain the following printf(3)-like format sequences:
Fmt seq Outputs Acts like
%t, %tE The timestamp of the end of the sample interval, in the input time format. %s
%tS The timestamp of the start of the sample interval, in the input time format. %s
%tM The timestamp of the midpoint of the sample interval, in the input time format. %s
%T, %TE The timestamp of the end of the sample interval, in the output time format. %s
%TS The timestamp of the start of the sample interval, in the output time format. %s
%TM The timestamp of the midpoint of the sample interval, in the output time format. %s
%k Kilobits per second for the current sample. %f
%M Message (or lines) per second for the current sample. %f
%B Bytes per second for the current sample. %f
%b Bits per second for the current sample. %f
%n Total number of message (lines) in the current sample. %ld
%c Total number of bytes in the current sample. %ld
%f File name for the current sample. %s

Notes:

The default output format is "%T\t%.2k\t%.2M\n" if a single input is provided, or "%T\t%.2k\t%.2M\t%f\n" if multiple inputs are provided

--output-timefmt=FMT

--output-timefmt specifies the output time format, used by the %T, %TS, %TM and %TE format sequences in the output format. The provided format should be a strftime(3) format string; the default is "%Y-%m-%dT%H:%M:%S".

--print-empty

By default, FIXtail will not print an information line for a sample interval during which no data was read from the corresponding input. The --print-empty option overrides this behavior. This would be useful in cases like feeding the output of FIXtail to a plotting program.

Latency/jitter analysis

With some additional information, FIXtail can extract and report latency information from FIX data. To achieve this, the FIX log data fed to FIXtail must have the following properties:

With QuickFIX/J, for instance, these can be achieved with the FileIncludeTimeStampForMessages and FileIncludeMilliseconds logging options.

Once this information is in place, you can activate latency monitoring with the --latency option:

--latency[=FMT]

Activates latency monitoring. The optional FMT argument is the strptime(3) format that should be used to parse the timestamp at the beginning of the line (see note below). If not specified, the format defaults to "%Y%m%d-%H:%M:%S".

When the --latency option is in use, the following additional format sequences are available for use in --output-format:
Fmt seq Outputs Acts like
%Lm Average latency in the current sample, in milliseconds. %f
%Ln Minimum observed latency in the current sample, in milliseconds. %ld
%Lx Maximum observed latency in the current sample, in milliseconds. %ld
%Lj Observed latency "jitter" in the current sample (defined as the difference between maximum and minimum observed latencies), in milliseconds. %ld
%Ld Standard deviation of latency observations in the current sample, in milliseconds. %f
%LM Median (or 50th percentile) of latency observations in the current sample, in milliseconds. %ld
%LN 5th percentile of latency observations in the current sample, in milliseconds. %ld
%LX 95th percentile of latency observations in the current sample, in milliseconds. %ld

A typical usage of FIXtail for latency analysis, then, might look like:

  fixtail -F --fix --latency --last=10m --sample=10s -o "%TE\t%n\t%Ln\t%Lx\t%.2Lm\t%.2Ld\n" fix-messages.log

which prints for each sample the message count, minimum and maximum observed latency, and mean and standard deviation to two decimal places.

Notes:

Latency-related options

Some options are relevant only to latency measurement:

--sample-adjust=NUM

Adds the specified number of milliseconds to each latency observation, and subtracts the same amount from the computed mean, minimum, maximum and percentiles. This is something of a kludge to compensate for clock differences between sender and receiver that can result in negative latency observations, which the internal percentile handling code cannot process. While this operation is mathematically an identity and will not change the reported mean, maximum or minimum (which may be negative in the intended use case), the reported percentiles may change slightly as an artifact of the internal code.

Miscellaneous options

In addition to the above options, and all the other options recognized by GNU tail, FIXtail recognizes the following options:

--line-buffering (or -L)

If specified, FIXtail buffers its input by line, not printing a line to output until it has been fully read.

[NOTE: You will usually not need to specify this option, as all of the FIXtail-specific options related to FIX formatting, timestamp matching, filtering or sampling have the effect of activating line buffering.]

--match-regex=REGEX (or -M REGEX)

If specified, only input lines that match the provided regex(7)-style regular expression will be output or included in sample tracking.

--payload-regex=REGEX

A regex(7)-style regular expression matching the contiguous subsequence of each input line that should be counted as observed bandwidth for sampling purposes. If the regular expression includes any parenthesized subexpressions, the portion of the line that matches the first subexpression will be counted; otherwise, the portion of the line that matches the full pattern will be counted. The default pattern is "8=FIX.*" if --fix is specified (matching from the start of the FIX message to end of line), or ".*" if not (matching the entire line).

--tcp-nodelay

If specified, the TCP_NODELAY flag will be set on the output file descriptor. This is useful, for instance, when output is going to a socket.


Aside: Latency percentiles and histograms

FIXtail approximates percentiles of latency observations by maintaining a logarithmically scaled histogram of latency observations internally. To explain how this works, we'll examine the default settings, with a resolution of three significant bits:

With the histogram in place, and a count of all observations, percentiles can be approximated by finding the appropriate bucket and linearly interpolating over the bucket's range of values as necessary.

The default resolution of the histogram, as noted above, is three significant bits. This can be adjusted with the --sig-bits=N option, trading increased accuracy of percentiles against increased memory space and time needed to maintain and process the histogram data. Allowed values of N are 1 through 7, inclusive; if not specified, N defaults to 3.