Calculate trends for linear data (that is, data with one dependent
and one independent variable whose relationship can be described as y
= mx + b
)
The easiest way to determine if a relationship is linear is to plot of
your values. If your data form a rough line, we're in business. But if
your plot shows a curve or a random point cloud then don't trust the
results you get from these functions. (n.b. check out
terezka/elm-plot
, which makes this very easy!)
Some kinds of data which fit these criteria:
A trend generated from your data. This contains various things you
may want, like line
s. Generate these with quick
and robust
. You will have different options for
interpretation depending on which method you choose to calculate.
{ slope : Basics.Float
, intercept : Basics.Float
}
The result of a trend prediction. Use this to make predictions
using predictY
.
line : Trend a -> Line
Retrieve the calculated trend line.
predictY : Line -> Basics.Float -> Basics.Float
Given an x
, predict y
.
predictY { slope = 1, intercept = 0 } 1
--> 1
predictY { slope = -1, intercept = 0 } 5.5 |> Ok
--> Ok -5.5
predictX : Line -> Basics.Float -> Basics.Float
Given a y
, predict x
.
predictX { slope = 1, intercept = 0 } 1
--> 1
predictX { slope = -1, intercept = 0 } 5.5 |> Ok
--> Ok -5.5
( Basics.Float, Basics.Float )
A single 2-dimensional point (x, y)
.
A trend calculated from quick
.
quick : List Point -> Result Trend.Math.Error (Trend Quick)
Plot a line through a series of points (x, y)
:
quick [ (1, 1), (2, 2), (3, 3), (4, 4) ]
|> Result.map line
--> Ok { slope = 1, intercept = 0 }
This is the fastest of the functions in this module, but it's also the most susceptible to being thrown off by outliers. Let's look at that line again, but with an outlier:
quick [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
|> Result.map line
--> Ok { slope = -0.9999999999999999, intercept = 3.9999999999999996 }
We went from a perfect fit to a horrible one! And, the more
outliers you have, the worse fit you'll get. You can get one measure
of goodness by sending your the result of this function to
goodnessOfFit
.
Under the covers, this is an ordinary least squares regression.
goodnessOfFit : Trend Quick -> Basics.Float
Get the goodness of fit for a quick trend. This is a percent, represented as a floating point number between 0 and 1. A higher number generally indicates a better fit, but it doesn't know anything about what your data means. This means that you have to use some judgement in interpreting it!
quick [ (1, 1), (2, 2), (3, 3), (4, 4) ]
|> Result.map goodnessOfFit
--> Ok 1
And again with that outlier from quick
:
quick [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
|> Result.map goodnessOfFit
--> Ok 0.19999999999999996
This calculation is only valid for quick
trends, since it
measures how well a fit has minimized the square sum of error. That
means it's only really useful for ordinary least squares, not the
Theil-Sen estimator we use for robust
Maintainer's note: this will evaluate the fit for the original
data. If you need to evaluate goodness of fit for new data given an
existing Trend
, we'll need to expose a new function. I don't have a
concrete use case for this, so the function does not exist yet. I want
to make this library work for you, so please open an
issue if you find
yourself in this situation!
A trend calculated from robust
.
robust : List Point -> Result Trend.Math.Error (Trend Robust)
When your data has outliers, you'll want to use a robust estimator
instead of the quick estimator. This is much slower (it runs roughly
in O(n^2)
time), but will still give good results in the face of
corrupted data. Specifically, it will still work if up to ~29.3% of
your data consists of outliers. Again, the easiest way to check this
is to visualize it. We can provide automated checks, but humans are
still the best at saying "hmm, something's funny here..."
For good data, we have the same results as quick
:
robust [ (1, 1), (2, 2), (3, 3), (4, 4) ]
|> Result.map line
--> Ok { slope = 1, intercept = 0 }
But when we have outliers, we still get a good result:
robust [ (1, 1), (2, 2), (3, 3), (4, 4), (5, -5) ]
|> Result.map line
--> Ok { slope = 1, intercept = 0 }
Under the covers, this is a Theil-Sen estimator (which is pretty cool and easy to get an intuitive grasp of what's going on; check it out!)
confidenceInterval : Trend Robust -> ( Line, Line )
Calculate a confidence interval from a robust set of data. Consult Wikipedia for a thorough understanding of what this may mean for your data set. This function gives a 95% confidence interval.
Maintainer's note: We ought to be able to generate a confidence interval for quick trends too, but I'm not confident enough in my math skills to do it correctly. Help wanted here! If you know how to do that calculation, let's work together and add it.