BrianHicks / elm-trend / Trend.Linear

Calculate trends for linear data (that is, data with one dependent and one independent variable whose relationship can be described as y = mx + b)

The easiest way to determine if a relationship is linear is to plot of your values. If your data form a rough line, we're in business. But if your plot shows a curve or a random point cloud then don't trust the results you get from these functions. (n.b. check out terezka/elm-plot, which makes this very easy!)

Some kinds of data which fit these criteria:


type Trend kind

A trend generated from your data. This contains various things you may want, like lines. Generate these with quick and robust. You will have different options for interpretation depending on which method you choose to calculate.

Using Trend Lines


type alias Line =
{ slope : Basics.Float
, intercept : Basics.Float 
}

The result of a trend prediction. Use this to make predictions using predictY.

line : Trend a -> Line

Retrieve the calculated trend line.

predictY : Line -> Basics.Float -> Basics.Float

Given an x, predict y.

predictY { slope = 1, intercept = 0 } 1
    --> 1

predictY { slope = -1, intercept = 0 } 5.5 |> Ok
    --> Ok -5.5

predictX : Line -> Basics.Float -> Basics.Float

Given a y, predict x.

predictX { slope = 1, intercept = 0 } 1
    --> 1

predictX { slope = -1, intercept = 0 } 5.5 |> Ok
    --> Ok -5.5

Creating Trends


type alias Point =
( Basics.Float, Basics.Float )

A single 2-dimensional point (x, y).

Quick Fit


type Quick

A trend calculated from quick.

quick : List Point -> Result Trend.Math.Error (Trend Quick)

Plot a line through a series of points (x, y):

 quick [ (1, 1), (2, 2), (3, 3), (4, 4) ]
     |> Result.map line
     --> Ok { slope = 1, intercept = 0 }

This is the fastest of the functions in this module, but it's also the most susceptible to being thrown off by outliers. Let's look at that line again, but with an outlier:

 quick [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
     |> Result.map line
     --> Ok { slope = -0.9999999999999999, intercept = 3.9999999999999996 }

We went from a perfect fit to a horrible one! And, the more outliers you have, the worse fit you'll get. You can get one measure of goodness by sending your the result of this function to goodnessOfFit.

Under the covers, this is an ordinary least squares regression.

goodnessOfFit : Trend Quick -> Basics.Float

Get the goodness of fit for a quick trend. This is a percent, represented as a floating point number between 0 and 1. A higher number generally indicates a better fit, but it doesn't know anything about what your data means. This means that you have to use some judgement in interpreting it!

quick [ (1, 1), (2, 2), (3, 3), (4, 4) ]
    |> Result.map goodnessOfFit
    --> Ok 1

And again with that outlier from quick:

 quick [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
     |> Result.map goodnessOfFit
     --> Ok 0.19999999999999996

This calculation is only valid for quick trends, since it measures how well a fit has minimized the square sum of error. That means it's only really useful for ordinary least squares, not the Theil-Sen estimator we use for robust

Maintainer's note: this will evaluate the fit for the original data. If you need to evaluate goodness of fit for new data given an existing Trend, we'll need to expose a new function. I don't have a concrete use case for this, so the function does not exist yet. I want to make this library work for you, so please open an issue if you find yourself in this situation!

Robust Fit


type Robust

A trend calculated from robust.

robust : List Point -> Result Trend.Math.Error (Trend Robust)

When your data has outliers, you'll want to use a robust estimator instead of the quick estimator. This is much slower (it runs roughly in O(n^2) time), but will still give good results in the face of corrupted data. Specifically, it will still work if up to ~29.3% of your data consists of outliers. Again, the easiest way to check this is to visualize it. We can provide automated checks, but humans are still the best at saying "hmm, something's funny here..."

For good data, we have the same results as quick:

 robust [ (1, 1), (2, 2), (3, 3), (4, 4) ]
     |> Result.map line
     --> Ok { slope = 1, intercept = 0 }

But when we have outliers, we still get a good result:

 robust [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
     |> Result.map line
     --> Ok { slope = 1, intercept = 0 }

Under the covers, this is a Theil-Sen estimator (which is pretty cool and easy to get an intuitive grasp of what's going on; check it out!)

confidenceInterval : Trend Robust -> ( Line, Line )

Calculate a confidence interval from a robust set of data. Consult Wikipedia for a thorough understanding of what this may mean for your data set. This function gives a 95% confidence interval.

Maintainer's note: We ought to be able to generate a confidence interval for quick trends too, but I'm not confident enough in my math skills to do it correctly. Help wanted here! If you know how to do that calculation, let's work together and add it.