This module contains functions specifically designed to work with large data sets.
{ key : a -> key
, filter : a -> Basics.Bool
, operator : Operator a
}
Type that represents an aggregation on a type a
with a key of key
. It encapsulates the following information:
key
is a function that gets the key of each a
filter
is a function used for filtering items out before the aggregation. This can be set to always True
to not do any filtering.operator
is the aggregation operation to apply (count, sum, average, ...)groupBy : (a -> key) -> List a -> AssocList.Dict key (List a)
Group a list of items into a dictionary. Grouping is done using a function that returns a key for each item. The resulting dictionary will use those keys as the key of each entry in the dictionary and values will be lists of items for each key.
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> groupBy .key1
{- == Dict.fromList
[ ( "k1_1"
, [ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
]
, ( "k1_2",
, [ TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
]
-}
aggregate : (key -> Aggregator a Morphir.SDK.Key.Key0 -> b) -> AssocList.Dict key (List a) -> List b
Aggregates a dictionary that contains lists of items as values into a list that contains exactly one item per key.
The first argument is a function that takes a key and an aggregator and it should return a single item in the resulting
list. The aggregator is a function that takes one of the aggregation functions in this module (count
, sumOf
,
minimumOf
, ...) and returns the aggregated value for the list of values in the input dictionary.
grouped =
Dict.fromList
[ ( "k1_1"
, [ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
]
, ( "k1_2",
, [ TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
]
grouped
|> aggregate
(\key inputs ->
{ key = key
, count = inputs (count |> withFilter (\a -> a.value < 7))
, sum = inputs (sumOf .value)
, max = inputs (maximumOf .value)
, min = inputs (minimumOf .value)
}
)
{- ==
[ { key = "k1_1", count = 4, sum = 10, max = 4, min = 1 }
, { key = "k1_2", count = 2, sum = 26, max = 8, min = 5 }
]
-}
This function is designed to be used in combination with groupBy
.
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> groupBy .key1
|> aggregate
(\key inputs ->
{ key = key
, count = inputs (count |> withFilter (\a -> a.value < 7))
, sum = inputs (sumOf .value)
, max = inputs (maximumOf .value)
, min = inputs (minimumOf .value)
}
)
{ ==
[ { key = "k1_1", count = 4, sum = 10, max = 4, min = 1 }
, { key = "k1_2", count = 2, sum = 26, max = 8, min = 5 }
]
}
aggregateMap : Aggregation a key1 -> (Basics.Float -> a -> b) -> List a -> List b
Map function that provides an aggregated value to the mapping function. The first argument is a tuple where the first element is a function that defines the aggregation key, the second element is predicate that allows you to filter out certain rows from the aggregation and the third argument is the aggregation operation to apply. Usage:
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> aggregateMap
(sumOf .value |> byKey .key1)
(\\totalValue input ->
( input, totalValue / input.value )
)
{- ==
[ ( TestInput1 "k1_1" "k2_1" 1, 10 / 1 )
, ( TestInput1 "k1_1" "k2_1" 2, 10 / 2 )
, ( TestInput1 "k1_1" "k2_2" 3, 10 / 3 )
, ( TestInput1 "k1_1" "k2_2" 4, 10 / 4 )
, ( TestInput1 "k1_2" "k2_1" 5, 26 / 5 )
, ( TestInput1 "k1_2" "k2_1" 6, 26 / 6 )
, ( TestInput1 "k1_2" "k2_2" 7, 26 / 7 )
, ( TestInput1 "k1_2" "k2_2" 8, 26 / 8 )
]
-}
aggregateMap2 : Aggregation a key1 -> Aggregation a key2 -> (Basics.Float -> Basics.Float -> a -> b) -> List a -> List b
Map function that provides two aggregated values to the mapping function. The first argument is a tuple where the first element is a function that defines the aggregation key, the second element is predicate that allows you to filter out certain rows from the aggregation and the third argument is the aggregation operation to apply. Usage:
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> aggregateMap2
(sumOf .value |> byKey .key1)
(maximumOf .value |> byKey .key2)
(\totalValue maxValue input ->
( input, totalValue * maxValue / input.value )
)
{- ==
[ ( TestInput1 "k1_1" "k2_1" 1, 10 * 6 / 1 )
, ( TestInput1 "k1_1" "k2_1" 2, 10 * 6 / 2 )
, ( TestInput1 "k1_1" "k2_2" 3, 10 * 8 / 3 )
, ( TestInput1 "k1_1" "k2_2" 4, 10 * 8 / 4 )
, ( TestInput1 "k1_2" "k2_1" 5, 26 * 6 / 5 )
, ( TestInput1 "k1_2" "k2_1" 6, 26 * 6 / 6 )
, ( TestInput1 "k1_2" "k2_2" 7, 26 * 8 / 7 )
, ( TestInput1 "k1_2" "k2_2" 8, 26 * 8 / 8 )
]
-}
aggregateMap3 : Aggregation a key1 -> Aggregation a key2 -> Aggregation a key3 -> (Basics.Float -> Basics.Float -> Basics.Float -> a -> b) -> List a -> List b
Map function that provides three aggregated values to the mapping function. The first argument is a tuple where the first element is a function that defines the aggregation key, the second element is predicate that allows you to filter out certain rows from the aggregation and the third argument is the aggregation operation to apply. Usage:
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> aggregateMap3
(sumOf .value |> byKey .key1)
(maximumOf .value |> byKey .key2)
(minimumOf .value |> byKey (key2 .key1 .key2))
(\totalValue maxValue minValue input ->
( input, totalValue * maxValue / input.value + minValue )
)
{- ==
[ ( TestInput1 "k1_1" "k2_1" 1, 10 * 6 / 1 + 1 )
, ( TestInput1 "k1_1" "k2_1" 2, 10 * 6 / 2 + 1 )
, ( TestInput1 "k1_1" "k2_2" 3, 10 * 8 / 3 + 3 )
, ( TestInput1 "k1_1" "k2_2" 4, 10 * 8 / 4 + 3 )
, ( TestInput1 "k1_2" "k2_1" 5, 26 * 6 / 5 + 5 )
, ( TestInput1 "k1_2" "k2_1" 6, 26 * 6 / 6 + 5 )
, ( TestInput1 "k1_2" "k2_2" 7, 26 * 8 / 7 + 7 )
, ( TestInput1 "k1_2" "k2_2" 8, 26 * 8 / 8 + 7 )
]
-}
aggregateMap4 : Aggregation a key1 -> Aggregation a key2 -> Aggregation a key3 -> Aggregation a key4 -> (Basics.Float -> Basics.Float -> Basics.Float -> Basics.Float -> a -> b) -> List a -> List b
Map function that provides three aggregated values to the mapping function. The first argument is a tuple where the first element is a function that defines the aggregation key, the second element is predicate that allows you to filter out certain rows from the aggregation and the third argument is the aggregation operation to apply. Usage:
testDataSet =
[ TestInput1 "k1_1" "k2_1" 1
, TestInput1 "k1_1" "k2_1" 2
, TestInput1 "k1_1" "k2_2" 3
, TestInput1 "k1_1" "k2_2" 4
, TestInput1 "k1_2" "k2_1" 5
, TestInput1 "k1_2" "k2_1" 6
, TestInput1 "k1_2" "k2_2" 7
, TestInput1 "k1_2" "k2_2" 8
]
testDataSet
|> aggregateMap4
(sumOf .value |> byKey .key1)
(maximumOf .value |> byKey .key2)
(minimumOf .value |> byKey (key2 .key1 .key2))
(averageOf .value |> byKey (key2 .key1 .key2))
(\totalValue maxValue minValue average input ->
( input, totalValue * maxValue / input.value + minValue + average )
)
{- ==
[ ( TestInput1 "k1_1" "k2_1" 1, 10 * 6 / 1 + 1 + 1.5 )
, ( TestInput1 "k1_1" "k2_1" 2, 10 * 6 / 2 + 1 + 1.5 )
, ( TestInput1 "k1_1" "k2_2" 3, 10 * 8 / 3 + 3 + 3.5 )
, ( TestInput1 "k1_1" "k2_2" 4, 10 * 8 / 4 + 3 + 3.5 )
, ( TestInput1 "k1_2" "k2_1" 5, 26 * 6 / 5 + 5 + 5.5 )
, ( TestInput1 "k1_2" "k2_1" 6, 26 * 6 / 6 + 5 + 5.5 )
, ( TestInput1 "k1_2" "k2_2" 7, 26 * 8 / 7 + 7 + 7.5 )
, ( TestInput1 "k1_2" "k2_2" 8, 26 * 8 / 8 + 7 + 7.5 )
]
-}
count : Aggregation a Morphir.SDK.Key.Key0
Count the number of rows in a group.
sumOf : (a -> Basics.Float) -> Aggregation a Morphir.SDK.Key.Key0
Apply a function to each row that returns a numeric value and return the sum of the values.
minimumOf : (a -> Basics.Float) -> Aggregation a Morphir.SDK.Key.Key0
Apply a function to each row that returns a numeric value and return the minimum of the values.
maximumOf : (a -> Basics.Float) -> Aggregation a Morphir.SDK.Key.Key0
Apply a function to each row that returns a numeric value and return the maximum of the values.
averageOf : (a -> Basics.Float) -> Aggregation a Morphir.SDK.Key.Key0
Apply a function to each row that returns a numeric value and return the average of the values.
weightedAverageOf : (a -> Basics.Float) -> (a -> Basics.Float) -> Aggregation a Morphir.SDK.Key.Key0
Apply two functions to each row that returns a numeric value and return the weighted of the values using the first function to get the weights.
byKey : (a -> key) -> Aggregation a Morphir.SDK.Key.Key0 -> Aggregation a key
Changes the key of an aggregation. Usage:
count
|> byKey .key1
== { key = .key1
, filter = always True
, operator = Count
}
withFilter : (a -> Basics.Bool) -> Aggregation a key -> Aggregation a key
Adds a filter to an aggregation. Usage:
count
|> withFilter (\a -> a.value < 0)
== { key = key0
, filter = \a -> a.value < 0
, operator = Count
}
constructAggregationCall : Morphir.IR.Value.TypedValue -> Morphir.IR.Value.TypedValue -> Morphir.IR.Value.TypedValue -> Result ConstructAggregationError AggregationCall
constructAggregationCall transforms a Morphir.SDK.Aggregate groupBy and agggregate call into a single data structure
An AggregationCall represents a call to Morphir.SDK.Aggregate.aggregate
Its values are:
An AggregateValue represents a single aggregation used within the overall Aggregation call
Its values are:
A ConstructAggregationError represents the ways constructAggregationCall may fail
Its values are:
aggregate
, 'key' was used to create multiple columns of the
field(s) the data is grouped by, which can't be represented in Spark.