Harnessing Polars and Geopandas to Generate Hundreds of thousands of Transects in Seconds | by Tony Albanese | Jan, 2025

The Drawback:

I wanted to generate a transect each 10 ft throughout each avenue in New York Metropolis, and I needed to do it in a New York Minute. The Generate Transects Alongside Strains instrument constructed into ArcGIS Professional was going to take a number of days to run, so I needed to give you one thing at the least barely sooner.

To resolve this downside, I turned to 2 of my favourite bears, Geopandas and Polars, to make use of spatial algorithms and a splash of linear algebra to make 4.2 million transects in round 60 seconds.

Why Transects? On this circumstance, I might go on to make use of these transects to separate up roadway traces into equal intervals for storm injury evaluation. It’s simpler to separate a line with one other line than it’s to separate a line with a degree, since you may assure they may intersect and also you don’t have to fret about floating level imprecision. However, transects are super-useful for plenty of software, comparable to sampling river depth profiles and creating an everyday grid from a curvilinear line.

The Course of:

First, I generated factors alongside every line at 10 ft intervals utilizing Geopandas built-in interpolate perform. As soon as I had these factors, I swapped them over to Polars, the place I used to be capable of make the most of Polars’ built-in parallelism and intuitive syntax to shortly convert these factors into (roughly) perpendicular transects. Lastly, I put the transects right into a GeoDataFrame for additional spatial enjoyable.

It might sound annoying to travel between Geopandas and Polars, however every has its personal benefits — GeoPandas has spatial operations, and Polars is depraved quick and I strongly choose the syntax. The Polars .to_pandas() and .from_pandas() strategies work so effectively that I barely discover the swap. Whereas placing a panda bear and a polar bear into the identical zoo enclosure could have disasterous penalties, on this circumstance they can play collectively fairly properly.

Necessities: This code was developed in a mamba surroundings with the next packages and has not been examined with different variations

geopandas==1.0.1
pandas==2.2.3
polars==1.18.0
Shapely==2.0.6

The info used on this article comes from NYC Open Knowledge, and is on the market for obtain right here. On this instance, I’m utilizing the December 24, 2024 launch of the NYC Avenue Centerlines.

Step 1: Generate factors alongside traces each 10 ft with Geopandas

With a purpose to create transects alongside every line, we first want to seek out the centerpoint of every transect. Since we would like a transect each 10 ft, we’ll seize a degree each 10 ft alongside each linestring within the dataset. We’re additionally seize an extra level barely offset from every transect level; this can permit us to findthe orientation wanted to make the transect perpendicular to the unique line. Earlier than we get to all that, we have to load the information.

import geopandas as gpd

ROADS_LOC = r"NYC Avenue CenterlineNYC_Roads.shp"
ROADS_GDF = (
gpd.read_file(ROADS_LOC)
.reset_index(names="unique_id")
.to_crs("epsg:6539")
)

Along with loading the information, we additionally assign a unique_id to every street based mostly on its index, and make sure the GeoDataFrame is in a projected coordinate system with ft as its base unit — since these street traces are in New York Metropolis, we use New York — Lengthy Island State Airplane because the coordinate reference system. The unique_id column will permit us to hyperlink our generated transects again to the unique street section.

As soon as the information has been loaded, we have to pattern alongside each line at 10 ft intervals. We are able to accomplish this utilizing the GeoPandas interpolate perform, as beneath:

import pandas as pd
from typing import Tuple, Union

def interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]

new_points = gdf.interpolate(distance)

new_gdf = gpd.GeoDataFrame(
geometry=new_points,
information={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)

return gdf, new_gdf

def get_road_points(roads: gpd.GeoDataFrame, segment_length: int)
-> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)

working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)

return pd.concat(road_points, ignore_index=True)

The perform get_road_points() loops by way of all of the section distances within the roads GeoDataFrame, given a section size s. Each iteration i, we choose solely roads that are longer than s * i. We pattern a degree alongside every line at s * i distance. These factors would be the centerpoints of our new transects. Since we wish to orient our new transects perpendicular to the road from which they spawned, we additionally pattern a degree at distance (s * i) -1. This offers us the approximate orientation of the road, which we’ll use in a while when developing the transects. As this technique outputs a GeoSeries, we construct a brand new GeoDataFrame, protecting the space of the purpose alongside the road, whether or not the purpose will likely be used as a transect, and our beforehand created unique_id column. Lastly, we concatenate all these GeoDataframes collectively, giving us one thing that appears like this:

Factors sampled for transect technology alongside NYC roadway traces

Lets strive it out with our roads dataset and see if it really works:

import matplotlib.pyplot as plt

points_10ft = get_road_points(ROADS_GDF, 10)
fig, ax = plt.subplots(figsize=(10, 12))

ROADS_GDF.loc[ROADS_GDF['unique_id'] == 100].plot(ax=ax, coloration='blue')
points_10ft.loc[(points_10ft['unique_id'] == 100) &
(points_10ft['is_transect'])].plot(ax=ax, coloration='pink')

Factors sampled each 10 ft alongside Beaver St.

Step 2: Use Polars and rotation matrices to show these factors into transects

Now that we now have the factors we want, we are able to use Polar’s from_pandas() perform to transform our GeoDataFrame to Polars. Earlier than we do this, we wish to save the data from the geometry column — our x and y places — and drop the geometry column.

import polars as pl 

def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points

Lastly, we now have every part we have to assemble our transects. In essence, all we now have to do is locate the angle of every transect, assemble a transect of the requested size, and rotate and translate the transect so it’s correctly positioned in house. Which will sound like a doozy, but when we break it down into steps we’re not doing something overly difficult.

  1. Align every transect level with the purpose previous to it. This can permit us to orient every transect perpendicular to the unique line by pivoting across the prior level. In truth, lets name the prior level the ‘pivot level’ to make this extra clear.
  2. Translate the transect level/pivot level pair in order that the pivot level is at 0, 0. By doing this, we are able to simply rotate the transect level in order that it sits at y = 0.
  3. Rotate the transect level in order that it sits at y = 0. By doing this, the mathematics to assemble new factors which is able to kind our transect turns into easy addition.
  4. Assemble new factors above and beneath the rotated transect level in order that we now have a brand new line with distance equal to line size.
  5. Rotate the brand new factors again to the unique orientation relative to the pivot level.
  6. Translate the brand new factors in order that the pivot level returns to its authentic place.
Graphical illustration of transect-construction algorithm

Lets check out the code to do that. On my outdated laptop computer this code takes about 3 seconds to generate 4.2 million transects.


def make_lines(
pl_points: pl.DataFrame, line_length: Union[float, int])
-> pl.DataFrame:
return (
pl_points.type("unique_id", "distance")
# Pair every transect level with its previous pivot level
# over() teams by distinctive id, and shift(1) strikes
# every column down by 1 row
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
# Take away pivot level rows
.filter(pl.col("is_transect"))
# Translate every level so the pivot is now (0, 0)
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
# Calculate angle theta between translated transect level and x-axis
.with_columns(theta=pl.arctan2(
pl.col("translated_y"), pl.col("translated_x")
)
)
# Calculate phrases in rotation matrix
.with_columns(
theta_cos=pl.col("theta").cos(),
theta_sin=pl.col("theta").sin()
)
# Add y-coordinates above and beneath x axis so we now have a line of desired
# size. Since we all know that x = 1 (the space between the transect
# and pivot factors), we do not have to explicitly set that time period.
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
# Apply rotation matrix to factors (1, new_y1) and (1, new_y2)
# and translate again to the unique location
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
# Assemble a Effectively-Recognized Textual content illustration of the transects
# for simple conversion to GeoPandas
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)

How are we really engaging in the rotation?

We rotate the transect level concerning the origin utilizing a rotation matrix. This can be a 2×2 matrix which, when utilized to some extent in a Second-plane, rotates the purpose θ levels across the origin (0, 0).

A counter-clockwise rotation matrix, courtesy of https://en.wikipedia.org/wiki/Rotation_matrix

To make use of it, all we have to do is calculate θ, then multiply the purpose (x, y) and the rotation matrix. This offers us the system new_x = x*cos(θ)-y*sin(θ), new_y = x*sin(θ) + y*cos(θ). We calculate the end result utilizing this code:

.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)

We’re capable of simplify the calculation, since we all know that x is at all times 1 for our factors (x, y). As an alternative of calculating x*cos(θ)-y*sin(θ) to seek out the x worth of a rotated level, we are able to simply do cos(θ)-y*sin(θ), saving us slightly little bit of computation. After all, this received’t work in case your pivot level and transect level will not be 1 unit aside, however for our functions it’s simply superb. As soon as we now have the rotation accomplished, all we now have to do is translate the newly rotated factors by the space between the pivot level and (0,0), and increase, we’re finished. If you’re eager about studying extra about linear algebra and affine transformations, I extremely reccomend the 3Blue1Brown sequence Essence of Linear Algebra.

Utilizing the above capabilities, we are able to take our GeoDataFrame, convert it to Polars, and calculate new transects like so:

points_10ft_pl = roads_to_polars(points_10ft)
transects_10ft = make_lines(points_10ft_pl, 10)
transects_10ft.choose(['unique_id', 'wkt'])
Polars DataFrame with WKT Linestrings for our newly generated transects

Step 3: Convert the transects again to Geopandas

Lastly! We’ve got the geometric coordinates of the transects we want, however we would like them in a GeoDataFrame so we are able to return to the consolation of GIS operations. Since we constructed our new transects as WKT Linestrings on the finish of the make_lines() perform, the code to take action is straightforward:

transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
information={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(
transects_10ft.choose("wkt").to_series().to_list()
),
)

And thats it! We now have a GeoDataFrame with 4.2 million transects, every linked to their authentic roadway utilizing the unique_id column.

Lets try our new geometry:

ROAD_ID = 98237
fig, ax = plt.subplots(figsize=(10, 12))
ROADS_GDF.loc[ROADS_GDF['unique_id'] == ROAD_ID].plot(ax=ax, coloration='blue')
transects_gdf.loc[transects_gdf['unique_id'] == ROAD_ID].plot(ax=ax, coloration='inexperienced')
ax.set_aspect("equal")
Transects spaced 10ft aside alongside a pedestrian path
Transects alongside Rockaway Seaside Blvd (unique_id = 99199)

Transects spaced 10ft aside alongside a pedestrian pathLooks good! We’re shedding slightly little bit of precision with the WKT conversion, so there’s in all probability a greater approach to do this, however it works effectively sufficient for me.

Placing all of it Collectively

To your comfort, I’ve mixed the code to generate the transects beneath. This can solely work with information projected to a planar coordinate system, and if you wish to use a pivot level that’s additional than 1 unit from a transect level, you’ll need to change the code.

from typing import Tuple, Union

import geopandas as gpd
import pandas as pd
import polars as pl

def interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]

new_points = gdf.interpolate(distance)

new_gdf = gpd.GeoDataFrame(
geometry=new_points,
information={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)

return gdf, new_gdf

def get_road_points(roads: gpd.GeoDataFrame, segment_length: int) -> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)

working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)

return pd.concat(road_points, ignore_index=True)

def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points

def make_lines(pl_points: pl.DataFrame, line_length: Union[float, int]) -> pl.DataFrame:
return (
pl_points.type("unique_id", "distance")
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
.filter(pl.col("is_transect"))
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
.with_columns(theta=pl.arctan2(pl.col("translated_y"), pl.col("translated_x")))
.with_columns(theta_cos=pl.col("theta").cos(), theta_sin=pl.col("theta").sin())
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)

ROADS_LOC = r"NYC Avenue CenterlineNYC_Roads.shp"
ROADS_GDF = gpd.read_file(ROADS_LOC).reset_index(names="unique_id").to_crs("epsg:6539")

points_10ft = get_road_points(roads=ROADS_GDF, segment_length=10)
points_10ft_pl = roads_to_polars(road_points=points_10ft)
transects_10ft = make_lines(pl_points=points_10ft_pl, line_length=10)
transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
information={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(transects_10ft.choose("wkt").to_series().to_list()),
)

Thanks for studying! There are in all probability much more methods this code could possibly be optimized or made extra versatile, however it solved an enormous downside for me and reduce down compute time from days to seconds. With slightly little bit of principle and understanding when to make use of what instrument, issues that beforehand appear insurmountable change into shockingly simple. It solely took me about half-hour and slightly wikipedia linear algebra referesher to write down the code for this publish, saving me tons of time and, extra importantly, defending me from having to elucidate to a consumer why a undertaking was pointlessly delayed. Let me know if this was useful to you!

Bartel-Pritchard Sq./Prospect Park West with transects spaced each 10ft. I later used the unique_id discipline to make sure roads had been solely break up utilizing transects belonging to that street.

Until in any other case famous, all photos are by the creator.