CannerFlow implements the
function using the
HyperLogLog data structure.
CannerFlow implements HyperLogLog data sketches as a set of 32-bit buckets which store a maximum hash. They can be stored sparsely (as a map from bucket ID to bucket), or densely (as a contiguous memory block). The HyperLogLog data structure starts as the sparse representation, switching to dense when it is more efficient. The P4HyperLogLog structure is initialized densely and remains dense for its lifetime.
hyperloglog_type implicitly casts to
p4hyperloglog_type, while one can
Data sketches can be serialized to and deserialized from
varbinary. This allows them to be stored for later use. Combined with the ability to merge multiple sketches, this allows one to calculate
approx_distinct of the elements of a partition of a query, then for the entirety of a query with very little cost.
For example, calculating the
HyperLogLog for daily unique users will allow weekly or monthly unique users to be calculated incrementally by combining the dailies. This is similar to computing weekly revenue by summing daily revenue. Uses of
GROUPING SETS can be converted to use
approx_set(x) -> HyperLogLog
HyperLogLog sketch of the input data set of
x. This data
can be stored and used later by calling
cardinality(hll) -> bigint
This will perform
the data summarized by the
hll HyperLogLog data sketch.
empty_approx_set() -> HyperLogLog
Returns an empty
merge(HyperLogLog) -> HyperLogLog
HyperLogLog of the aggregate union of the individual