### The TUBE algorithm: Discovering trends in time series for the early detection of fuel leaks from underground storage tanks

The paper uses United States Environmental Protection Agency's (EPA) Statistical Inventory Reconciliation (SIR) software-based underground gasoline tank leak detection method as a baseline for acceptable true positive (not less than 95%) and false positive rates (not more than 5%). The TUBE algorithm, apart from using aforementioned SIR method, uses two-step data filtering mechanism, trend detection over time series using tubes , and detected trends’ assessment for generating a final result

## SIR method

Theoretical fuel volume at the end of time frame is $V_{book}$
$V_{book} = V_{open} - V_{sales} + V_{delivery}$
Difference between actual ($V_{close}$) and theoretical ($V_{book}$) fuel volume is variance.
$Var = V_{close} - V_{book}$

Here:

• $V_{book}$ - fuel volume for a single time frame
• $V_{open}$ - starting fuel volume in the tank
• $V_{sales}$ - sold fuel volume
• $V_{delivery}$ - fuel volume that was received into the tank

Cumulative Variance allows detecting overall trends in multiple time intervals, to tell how the fuel tank state changed over time.
$CV = \sum_{i=1}^{n} Var(T_i)$

## Data filtering

The filtering in general take care of several significant factors, noises, and spikes, that decrease the quality of data. For this, several techniques are used on a windowed data point (considering values of some neighbors as well):

• Median
• Augmented median (appending various values, like average)
• Linear regression (using least squares method)

## TUBE algorithm

Apart from aforementioned SIR and filtering methods, TUBE algorithm also incorporates Trend detection and Trend interpretation. For this, trend interpretation is just a comparion of tube slope to a manually set threshold.

### Trend detection

Trends are represented by tubes that are sections with specific outlier tolerance. The graphical representation of a tube is a pair of two functions that are delimiting the filtered CV function from above and below, in relation to the X axis

Upper and lower tube bounds are defined as followed:
$tube_{upper}(x) = tube_{s}(x) + tol$
$tube_{lower}(x) = tube_{s}(x) - tol$

$tol$ is tube tolerance and is calculated as:
$median\{tol_{factor} * tube_{dev} * tol_{min} * tol_{max}\}$
Here:

• $tol_{min}$ - minimal width of the tube
• $tol_{max}$ - maximal width of the tube
• $tol_{factor}$ - scale parameter
• $tube_{dev}$ - average absolute difference between k measurements:
$\frac{1}{k} \sum_{i=1}^{k} |y_i - tube_s (x_i)|$

Data item $y_i$ is added into current tube iff
$y_i \in \langle tube_{lower}(x_i), tube_{upper}(x_i) \rangle$
else a new tube is created with that data item as its first. Each tube has a minimum length k, which means that it is the initial base for each tube.

In above graph, tubes are represented as rectangles , bound by last data point that they have in them (originally paper has them connected). Data points are dots - don't have area, thus some might appear as outside the tube, whereas actually they are on the border.

Below graph, upon hovering a specific tube, information about tube's number, calculated tolerance and slope parameters (a and b) are displayed

## Review

1. Even though Augmented Median Filter is constantly mentioned, the configuration of what is being augmented into the set is unclear.
2. Basic TUBE algorithm's requirements, what exactly are they?

## References

1. M. Gorawski, A. Gorawska, K. Pasterak, The TUBE algorithm: Discovering trends in time series for the early detection of fuel leaks from underground storage tanks, Expert Systems with Applications, Volume 90, , Pages 356-373, ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2017.08.016. ScienceDirect

## Code source

Interactive graph was generated with the help of d3.js v4, sources for this article and code can be found here: gist.github.com/test_tubes.js