### The TUBE algorithm: Discovering trends in time series for the early detection of fuel leaks from underground storage tanks

The paper uses United States Environmental Protection Agency's (EPA) Statistical Inventory Reconciliation (SIR) software-based
underground gasoline tank leak detection method as a baseline for acceptable true positive (not less than 95%) and
false positive rates (not more than 5%). The TUBE algorithm, apart from using aforementioned SIR method, uses
two-step data filtering mechanism, trend detection over time series using tubes , and detected trends’ assessment
for generating a final result

## SIR method

Theoretical fuel volume at the end of time frame is $V_{book}$

$V_{book} = V_{open} - V_{sales} + V_{delivery}$

Difference between actual ($V_{close}$) and theoretical ($V_{book}$) fuel volume is variance.

$Var = V_{close} - V_{book}$

Here:

- $V_{book}$ - fuel volume for a single time frame
- $V_{open}$ - starting fuel volume in the tank
- $V_{sales}$ - sold fuel volume
- $V_{delivery}$ - fuel volume that was received into the tank

Cumulative Variance allows

detecting overall trends in multiple time intervals, to tell how the fuel tank state changed over time.

$CV = \sum_{i=1}^{n} Var(T_i)$

## Data filtering

The filtering in general take care of several significant factors, noises, and spikes, that decrease the quality of data. For this, several techniques are used on a windowed data point (considering values of some neighbors as well):

- Median
- Augmented median (appending various values, like average)
- Linear regression (using least squares method)

## TUBE algorithm

Apart from aforementioned SIR and filtering methods, TUBE algorithm also incorporates Trend detection

and Trend interpretation

. For this, trend interpretation is just a comparion of tube slope to a manually set threshold.

### Trend detection

Trends are represented by tubes that are sections with specific outlier tolerance. The graphical representation of a tube is a pair of two functions that are delimiting the filtered CV function from above and below, in relation to the X axis

Upper and lower tube bounds are defined as followed:

$tube_{upper}(x) = tube_{s}(x) + tol$

$tube_{lower}(x) = tube_{s}(x) - tol$

$tol$ is tube tolerance and is calculated as:

$median\{tol_{factor} * tube_{dev} * tol_{min} * tol_{max}\}$

Here:

- $tol_{min}$ - minimal width of the tube
- $tol_{max}$ - maximal width of the tube
- $tol_{factor}$ - scale parameter
- $tube_{dev}$ - average absolute difference between k measurements:

$\frac{1}{k} \sum_{i=1}^{k} |y_i - tube_s (x_i)|$

Data item $y_i$ is added into current tube iff

$y_i \in \langle tube_{lower}(x_i), tube_{upper}(x_i) \rangle$

else a new tube is created with that data item as its first. Each tube has a minimum length k, which
means that it is the initial base for each tube.

In above graph, tubes are represented as rectangles , bound by last data point that they have in them (originally paper has them connected). Data points are dots - don't have area, thus some might appear as outside the tube, whereas actually they are on the border.

Below graph, upon hovering a specific tube, information about tube's number, calculated tolerance and slope parameters (a and b) are displayed

## Review

- Even though Augmented Median Filter is constantly mentioned, the configuration of what is being augmented into the set is unclear.
- Basic TUBE algorithm's requirements, what exactly are they?

## References

- M. Gorawski, A. Gorawska, K. Pasterak, The TUBE algorithm: Discovering trends in time series for the early detection of fuel leaks from underground storage tanks, Expert Systems with Applications, Volume 90, , Pages 356-373, ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2017.08.016. ScienceDirect

## Code source

Interactive graph was generated with the help of d3.js v4, sources for this article and code can be found here: gist.github.com/test_tubes.js

## Comments

## Post a Comment