5 Key Classes for Google Earth Engine Newcomers | by Daniel Pazmiño Vernaza | Jan, 2025

Fingers-On Insights from a Python API person

Land cowl map for the Paute water bassin in Ecuador for the yr 2020. Picture created utilizing Google Earth Engine Python API and Geemap. Information supply: Friedl, M., Sulla-Menashe, D. (2022); Lehner, B., Grill G. (2013) and Lehner, B., Verdin, Ok., Jarvis, A. (2008).

As a local weather scientist, Google Earth Engine (GEE) is a robust device in my toolkit. No extra downloading heavy satellite tv for pc photos to my pc.

GEE main API is Javascript, though Python customers may entry a robust API to carry out related duties. Sadly, there are fewer supplies for studying GEE with Python.

Nevertheless, I like Python. Since I realized that GEE has a Python API, I imagined a world of prospects combining the highly effective GEE’s highly effective cloud-processing capabilities with Python frameworks.

The 5 classes come from my most up-to-date challenge, which concerned analyzing water steadiness and drought in a water basin in Ecuador. Nonetheless, the ideas, code snippets and examples may apply to any challenge.

The story presents every lesson following the sequence of any knowledge evaluation challenge: knowledge preparation (and planning), evaluation, and visualization.

Additionally it is price mentioning that I additionally present some common recommendation impartial of the language you utilize.

This text for GEE newcomers assumes an understanding of Python and a few geospatial ideas.

If Python however are new to GEE (like me a while in the past), it’s best to know that GEE has optimized capabilities for processing satellite tv for pc photos. We received’t delve into the small print of those capabilities right here; it’s best to verify the official documentation.

Nevertheless, my recommendation is to verify first if a GEE can carry out the evaluation you wish to conduct. Once I first began utilizing GEE, I used it as a list for locating knowledge, relying solely on its primary capabilities. I’d then write Python code for a lot of the analyses. Whereas this strategy can work, it usually results in vital challenges. I’ll talk about these challenges in later classes.

Don’t restrict your self to studying solely the essential GEE capabilities. If Python (or coding generally), the training curve for these capabilities shouldn’t be very steep. Attempt to use them as a lot as potential — it’s price it when it comes to effectivity.

A closing be aware: GEE capabilities even assist machine studying duties. These GEE capabilities are straightforward to implement and can assist you clear up many issues. Solely while you can’t clear up your drawback with these capabilities must you think about writing Python code from scratch.

For example for this lesson, think about the implementation of a clustering algorithm.

Instance code with GEE capabilities

# Pattern the picture to create enter for clustering
sample_points = clustering_image.pattern(
area=galapagos_aoi,
scale=30, # Scale in meters
numPixels=5000, # Variety of factors to pattern
geometries=False # Do not embody geometry to save lots of reminiscence
)

# Apply k-means clustering (unsupervised)
clusterer = ee.Clusterer.wekaKMeans(5).practice(sample_points)

# Cluster the picture
consequence = clustering_image.cluster(clusterer)

Instance code with Python

import rasterio
import numpy as np
from osgeo import gdal, gdal_array

# Inform GDAL to throw Python exceptions and register all drivers
gdal.UseExceptions()
gdal.AllRegister()

# Open the .tiff file
img_ds = gdal.Open('Sentinel-2_L2A_Galapagos.tiff', gdal.GA_ReadOnly)
if img_ds is None:
elevate FileNotFoundError("The required file couldn't be opened.")

# Put together an empty array to retailer the picture knowledge for all bands
img = np.zeros(
(img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
dtype=gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType),
)

# Learn every band into the corresponding slice of the array
for b in vary(img_ds.RasterCount):
img[:, :, b] = img_ds.GetRasterBand(b + 1).ReadAsArray()

print("Form of the picture with all bands:", img.form) # (top, width, num_bands)

# Reshape for processing
new_shape = (img.form[0] * img.form[1], img.form[2]) # (num_pixels, num_bands)
X = img.reshape(new_shape)

print("Form of reshaped knowledge for all bands:", X.form) # (num_pixels, num_bands)

The primary block of code shouldn’t be solely shorter, however it would deal with the massive satellite tv for pc datasets extra effectively as a result of GEE capabilities are designed to scale throughout the cloud.

Whereas GEE’s capabilities are highly effective, understanding the constraints of cloud processing is essential when scaling up your challenge.

Entry to free cloud computing sources to course of satellite tv for pc photos is a blessing. Nevertheless, it’s not stunning that GEE imposes limits to make sure truthful useful resource distribution. If you happen to plan to make use of it for a non-commercial large-scale challenge (e.g. analysis deforestation within the Amazon area) and intend to remain inside the free-tier limits it’s best to plan accordingly. My common tips are:

  • Restrict the sizes of your areas, divide them, and work in batches. I didn’t want to do that in my challenge as a result of I used to be working with a single small water basin. Nevertheless, in case your challenge includes giant geographical areas this could be the primary logical step.
  • Optimize your scripts by prioritizing utilizing GEE capabilities (see Lesson 1).
  • Select datasets that allow you to optimize computing energy. For instance, in my final challenge, I used the Local weather Hazards Group InfraRed Precipitation with Station knowledge (CHIRPS). The unique dataset has a each day temporal decision. Nevertheless, it affords another model referred to as “PENTAD”, which supplies knowledge each 5 days. It corresponds to the sum of precipitation for these 5 days. Utilizing this dataset allowed me to save lots of pc energy by processing the compacted model with out sacrificing the standard of my outcomes.
  • Study the outline of your dataset, as it’d reveal scaling elements that would save pc energy. For example, in my water steadiness challenge, I used the Average Decision Imaging Spectroradiometer (MODIS) knowledge. Particularly, the MOD16 dataset, which is a available Evapotranspiration (ET) product. In line with the documentation, I may multiply my outcomes by a scaling issue of 0.1. Scaling elements assist cut back storage necessities by adjusting the info sort.
  • If worst involves worst, be ready to compromise. Scale back the decision of the analyses if the requirements of the examine enable it. For instance, the “reduceRegion” GEE operate helps you to summarize the values of a area (sum, imply, and many others.). It has a parameter referred to as “scale” which lets you change the size of the evaluation. For example, in case your satellite tv for pc knowledge has a decision of 10 m and GEE can’t course of your evaluation, you’ll be able to regulate the size parameter to a decrease decision (e.g. 50 m).

For example from my water steadiness and drought challenge, think about the next block of code:

# Scale back the gathering to a single picture (imply MSI over the time interval)
MSI_mean = MSI_collection.choose('MSI').imply().clip(pauteBasin)

# Use reduceRegion to calculate the min and max
stats = MSI_mean.reduceRegion(
reducer=ee.Reducer.minMax(), # Reducer to get min and max
geometry=pauteBasin, # Specify the ROI
scale=500, # Scale in meters
maxPixels=1e9 # Most variety of pixels to course of
)

# Get the outcomes as a dictionary
min_max = stats.getInfo()

# Print the min and max values
print('Min and Max values:', min_max)

In my challenge, I used a Sentinel-2 satellite tv for pc picture to calculate a moisture soil index (MSI). Then, I utilized the “reduceRegion” GEE operate, which calculates a abstract of values in a area (imply, sum, and many others.).

In my case, I wanted to seek out the utmost and minimal MSI values to verify if my outcomes made sense. The next plot reveals the MSI values spatially distributed in my examine area.

Month-to-month imply of moisture soil index values for the Paute basin (Ecuador) for the interval 2010–2020. Picture created utilizing Google Earth Engine Python API and Geemap. Information supply: European House Company (2025) ; Lehner, B., Grill G. (2013) and Lehner, B., Verdin, Ok., Jarvis, A. (2008).

The unique picture has a ten m decision. GEE struggled to course of the info. Due to this fact, I used the size parameter and lowered the decision to 500 m. After altering this parameter GEE was in a position to course of the info.

I’m obsessive about knowledge high quality. In consequence, I exploit knowledge however not often belief it with out verification. I like to take a position time in making certain the info is prepared for evaluation. Nevertheless, don’t let picture corrections paralyze your progress.

My tendency to take a position an excessive amount of time with picture corrections stems from studying distant sensing and picture corrections “the previous method”. By this, I imply utilizing software program that assists in making use of atmospheric and geometric correction to pictures.

These days, scientific businesses supporting satellite tv for pc missions can ship photos with a excessive degree of preprocessing. In actual fact, a terrific characteristic of GEE is its catalogue, which makes it straightforward to seek out ready-to-use evaluation merchandise.

Preprocessing is essentially the most time-consuming activity in any knowledge science challenge. Due to this fact, it have to be appropriately deliberate and managed.

The perfect strategy earlier than beginning a challenge is to determine knowledge high quality requirements. Based mostly in your requirements, allocate sufficient time to seek out one of the best product (which GEE facilitates) and apply solely the required corrections (e.g. cloud masking).

If you happen to love programming in Python (like me), you would possibly usually end up coding every little thing from scratch.

As a PhD pupil (beginning with coding), I wrote a script to carry out a t-test over a examine area. Later, I found a Python library that carried out the identical activity. Once I in contrast my script’s outcomes with these utilizing the library, the outcomes had been right. Nevertheless, utilizing the library from the beginning may have saved me time.

I’m sharing this lesson that can assist you keep away from these foolish errors with GEE. I’ll point out two examples of my water steadiness challenge.

Instance 1

To calculate the water steadiness in my basin, I wanted ET knowledge. ET shouldn’t be an noticed variable (like precipitation); it have to be calculated.

The ET calculation shouldn’t be trivial. You’ll be able to lookup the equations in textbooks and implement them in Python. Nevertheless, some researchers have printed papers associated to this calculation and shared their outcomes with the neighborhood.

That is when GEE is available in. The GEE catalogue not solely supplies noticed knowledge (as I initially thought) but in addition many derived merchandise or modelled datasets (e.g. reanalysis knowledge, land cowl, vegetation indices, and many others.). Guess what? I discovered a ready-to-use international ET dataset within the GEE catalogue — a lifesaver!

Instance 2:

I additionally think about myself a Geographic Data System (GIS) skilled. Through the years, I’ve acquired a considerable quantity of GIS knowledge for my work corresponding to water basin boundaries in shapefile format.

In my water steadiness challenge, my instinct was to import my water basin boundary shapefile to my GEE challenge. From there, I reworked the file right into a Geopandas object and continued my evaluation.

On this case, I wasn’t as fortunate as in Instance 1. I misplaced treasured time making an attempt to work with this Geopandas object which I couldn’t combine properly with GEE. Finally, this strategy didn’t make sense. GEE does have in its catalogue a product for water basin boundaries that’s straightforward to deal with.

Thus, a key takeaway is to keep up your workflow inside GEE at any time when potential.

As talked about initially of this text, integrating GEE with Python libraries may be extremely highly effective.

Nevertheless, even for easy analyses and plots, the mixing doesn’t appear simple.

That is the place Geemp is available in. Geemap is a Python package deal designed for interactive geospatial evaluation and visualization with GEE.

Moreover, I additionally discovered that it may well help with creating static plots in Python. I made plots utilizing GEE and Geemap in my water steadiness and drought challenge. The photographs included on this story used these instruments.

GEE is a robust device. Nevertheless, as a newbie, pitfalls are inevitable. This text supplies ideas and tips that can assist you begin on the appropriate foot with GEE Python API.

European House Company (2025). European House Company. (Yr). Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Degree-2A.

Friedl, M., Sulla-Menashe, D. (2022). MODIS/Terra+Aqua Land Cowl Kind Yearly L3 International 500m SIN Grid V061 [Data set]. NASA EOSDIS Land Processes Distributed Lively Archive Middle. Accessed 2025–01–15 from https://doi.org/10.5067/MODIS/MCD12Q1.061

Lehner, B., Verdin, Ok., Jarvis, A. (2008): New international hydrography derived from spaceborne elevation knowledge. Eos, Transactions, AGU, 89(10): 93–94.

Lehner, B., Grill G. (2013): International river hydrography and community routing: baseline knowledge and new approaches to review the world’s giant river programs. Hydrological Processes, 27(15): 2171–2186. Information is obtainable at www.hydrosheds.org