##### technology

# Area Under a Snowflake

Ever want to calculate the area under a curve in SQL? I know, I know.. who hasn’t right?

It’s pretty straightforward to approximate the area under a curve in SQL by computing a Riemann sum using the trapezoidal rule.

First, we’ll make up some coordinates that represent the curve we want to integrate. Let’s find the area under the square root function from 0 to 1. Just because I want to use

in this post (even though I can’t figure out how to make it display inline), we are approximating this integral:

. Boom.

Anyway, back to getting the coordinates, we will use 101 evenly spaced points on the curve:

WITH coordinates AS ( SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x, SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y FROM TABLE(GENERATOR(rowCount => 101)) ),

Next, we compute the width of the bottom side of the trapezoid and the difference in height between the left and right sides.

deltas AS ( SELECT x, y, LEAD(x) OVER (ORDER BY x) - x AS delta_x, LEAD(y) OVER (ORDER BY x) - y AS delta_y FROM coordinates ),

Now we can compute the area of each trapezoid in the Riemann sum.

partial_areas AS ( SELECT *, (delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area FROM deltas )

Here’s a look at what we have so far:

SELECT * FROM deltas

X | Y | DELTA_X | DELTA_Y | PARTIAL_AREA |
---|---|---|---|---|

0.000 | 0 | 0.010 | 0.1 | 0.0005 |

0.010 | 0.1 | 0.010 | 0.04142135624 | 0.001207106781 |

0.020 | 0.1414213562 | 0.010 | 0.03178372452 | 0.001573132185 |

0.030 | 0.1732050808 | 0.010 | 0.02679491924 | 0.001866025404 |

– | – | – | – | – |

0.980 | 0.9899494937 | 0.010 | 0.005037943445 | 0.009924684654 |

0.990 | 0.9949874371 | 0.010 | 0.005012562893 | 0.009974937186 |

1.000 | 1 | NULL | NULL | NULL |

To get the final answer, we can just sum the area of all the small trapezoids.

SELECT SUM(partial_area) AS area FROM partial_areas

If we evaluate the integral above, we get exactly 2/3. With this method, we get an answer of 0.6664629471, which is pretty close, close enough for what I’m using it for, and may or not be close enough for your needs.

Here’s all the SQL together:

WITH coordinates AS ( SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x, SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y FROM TABLE(GENERATOR(rowCount => 101)) ), deltas AS ( SELECT x, y, LEAD(x) OVER (ORDER BY x) - x AS delta_x, LEAD(y) OVER (ORDER BY x) - y AS delta_y FROM coordinates ), partial_areas AS ( SELECT *, (delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area FROM deltas ) SELECT SUM(partial_area) AS area FROM partial_areas ;

## Discussion