Eradicating Spikes from Raman Spectra with Python: A Step-by-Step Information | by Nicolas Coca, PhD | Nov, 2024

Discover, take away and change spikes with interpolated values

This tutorial is a part of a rising collection on Information Science for Raman Spectroscopy with Python printed in in direction of information science. It’s primarily based on this publication within the journal Analytica Chimica Acta. By following alongside, you’ll add a worthwhile instrument to your information evaluation toolkit — an efficient methodology for cleansing up Raman spectra that’s already utilized in printed analysis.

Despiking Graphene’s Raman spectrum. Picture by creator.

Spike removing is an important a part of Raman information preprocessing. Spikes, attributable to cosmic rays impacting the detector, seem as intense, slim peaks that may distort the evaluation. These bursts of vitality hit the charge-coupled machine (CCD) digital camera, creating sharp, high-intensity peaks that, if left uncorrected, can intervene with additional processing steps like normalization, spectral search, or multivariate information evaluation. Cleansing these artifacts is subsequently a precedence. This tutorial will cowl a sensible algorithm for eradicating spikes from Raman spectra. Utilizing Python, we’ll stroll by way of a user-friendly, customizable strategy for spike detection and correction to maintain Raman information correct and dependable.

Determine 1 exhibits an instance of a graphene Raman spectrum the place a spike is current. Graphene’s distinctive bodily properties — reminiscent of its electrical and thermal conductivity — have made it a extremely studied materials. Its Raman spectrum comprises peaks that replicate structural traits, revealing details about doping, pressure, and grain boundaries. Subsequently, Raman spectroscopy is a extensively used approach to characterize graphene. Nevertheless, to profit from this instrument, spikes have to be beforehand eliminated.

Determine 1. Raman spectrum from graphene with a spike. The code to generate this determine is proven beneath. Picture by creator.
import numpy as np
# Load information straight right into a numpy array
information = np.loadtxt(spiked_spectrum.asc, delimiter=',', skiprows=1)

# Extract Raman shift from the primary column (index)
ramanshift = information[:, 0]

# Extract depth from the second column (index 1in Python)
depth = information[:, 1]

# Plot the info
import matplotlib.pyplot as plt
fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth)
plt.xlabel('Raman shift (cm$^{-1}$)')
plt.ylabel('Depth (a.u.)')
plt.present()

The spike removing algorithm introduced right here consists of 4 major steps:

1. Peak discovering
2. Spike detection
3. Spike flagging
4. Spectrum correction

Let’s check out the completely different steps with Python code snippets:

1. Peak discovering: First, the algorithm identifies vital peaks by checking for native maxima with a minimal prominence threshold. Including a prominence threshold helps to exclude small noise-generated peaks, as we don’t goal to right all of the noise. See the next determine for comparability.

from scipy.sign import find_peaks
# Discover the peaks within the spectrum (with and with out prominence threshold)
peaks_wo_p, _ = find_peaks(depth) # Peaks discovered with no prominence threshold
peaks_w_p, _ = find_peaks(depth, prominence = 20) # Peaks discovered with no prominence threshold

fig, ax = plt.subplots(1, 2, figsize = (10,3))
ax[0].plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax[0].scatter(ramanshift[peaks_wo_p], depth[peaks_wo_p], marker ='.', coloration = 'crimson',label='Discovered peaks')
ax[1].plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax[1].scatter(ramanshift[peaks_w_p], depth[peaks_w_p], marker ='.', coloration = 'crimson',label='Discovered peaks')
plt.present()

Step 1: Discover the peaks in a spectrum. Picture by creator.

2. Spike detection: Then, spikes are flagged primarily based on their attribute slim widths. This level may assist in the automation of huge spectral datasets. If we all know the width of the Raman bands current in our spectra, we are able to select a threshold beneath such a price. For instance, with our system decision, we don’t count on to have graphene Raman bands with widths beneath 10 cm-1.

from scipy.sign import peak_widths
widths = peak_widths(depth, peaks_w_p)[0]

fig, ax = plt.subplots(figsize = (5,3))
ax.plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax2 = ax.twinx()
ax2.scatter(ramanshift[peaks_w_p], widths, marker ='+', coloration = 'crimson',label='Peak widths')
plt.present()

Step 2: Discovering the widths of the peaks. Picture by creator.

3. Spike flagging Subsequent, any information factors affected by spikes are flagged utilizing a spread calculated from the height’s prominence, successfully isolating corrupted pixels. In different phrases, we choose the window that have to be corrected

# Let's set the parameters:
width_param_rel = 0.8
width_threshold = 10 # Estimation of the width of the narrowest Raman band

# Calculation of the vary the place the spectral factors are asumed to be corrupted
widths_ext_a = peak_widths(depth, peaks_w_p, rel_height=width_param_rel)[2]
widths_ext_b = peak_widths(depth, peaks_w_p, rel_height=width_param_rel)[3]

# Create a vector the place spikes shall be flag: no spike = 0, spike = 1.
spikes = np.zeros(len(depth))

# Flagging the realm beforehand outlined if the height is taken into account a spike (width beneath width_threshold)
for a, width, ext_a, ext_b in zip(vary(len(widths)), widths, widths_ext_a, widths_ext_b):
if width < width_threshold:
spikes[int(ext_a) - 1: int(ext_b) + 2] = 1

fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth, zorder=0,label='Uncooked spectrum')
a=1
plt.scatter(ramanshift[int(widths_ext_a[a])-1 : int(widths_ext_b[a])+1],
depth[int(widths_ext_a[a])-1 : int(widths_ext_b[a])+1],
coloration ='crimson', label = 'corrupted factors')
plt.axvline(x = ramanshift[int(widths_ext_a[a]) -1], linestyle = '--', coloration = 'crimson')
plt.axvline(x = ramanshift[int(widths_ext_b[a]) + 1], linestyle = '--', coloration = 'crimson')
plt.present()

Step 3: Flagging the corrupted bins. Picture by creator.

4. Spectrum correction Lastly, these factors are corrected by way of interpolation of close by values, preserving the spectrum’s integrity for subsequent analyses.

from scipy import interpolate
# Let's set the parameter:
moving_average_window = 10

intensity_out = depth.copy()

# Interpolation of corrupted factors
for i, spike in enumerate(spikes):
if spike != 0: # If we've got an spike in place i
window = np.arange(i - moving_average_window, i + moving_average_window + 1) # we choose 2 ma + 1 factors round our spike
window_exclude_spikes = window[spikes[window] == 0] # From such interval, we select those which aren't spikes
interpolator = interpolate.interp1d(window_exclude_spikes, depth[window_exclude_spikes], type='linear') # We use the not corrupted factors across the spike to calculate the interpolation
intensity_out[i] = interpolator(i) # The corrupted level is exchanged by the interpolated worth.

fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth, zorder=0, coloration ='crimson',label='Uncooked spectrum')
plt.plot(ramanshift, intensity_out, zorder=0, label='Corrected spectrum')
plt.present()

Step 4: Spectrum correction. Picture by creator.

All these snippets will be summarized in a single perform. This perform is designed to be customizable primarily based in your particular information wants, with parameters for adjusting prominence and width:

import numpy as np
from scipy.sign import find_peaks, peak_widths, peak_prominences
from scipy import interpolate

def spike_removal(y,
width_threshold,
prominence_threshold=None,
moving_average_window=10,
width_param_rel=0.8,
interp_type='linear'):
"""
Detects and replaces spikes within the enter spectrum with interpolated values. Algorithm first
printed by N. Coca-Lopez in Analytica Chimica Acta. https://doi.org/10.1016/j.aca.2024.342312

Parameters:
y (numpy.ndarray): Enter spectrum depth.
width_threshold (float): Threshold for peak width.
prominence_threshold (float): Threshold for peak prominence.
moving_average_window (int): Variety of factors in transferring common window.
width_param_rel (float): Relative peak parameter for peak width.
tipo: sort of interpolation (linear, quadratic, cubic)

Returns:
numpy.ndarray: Sign with spikes changed by interpolated values.
"""

# First, we discover all peaks displaying a prominence above prominence_threshold on the spectra
peaks, _ = find_peaks(y, prominence=prominence_threshold)

# Create a vector the place spikes shall be flag: no spike = 0, spike = 1.
spikes = np.zeros(len(y))

# Calculation of the widths of the discovered peaks
widths = peak_widths(y, peaks)[0]

# Calculation of the vary the place the spectral factors are asumed to be corrupted
widths_ext_a = peak_widths(y, peaks, rel_height=width_param_rel)[2]
widths_ext_b = peak_widths(y, peaks, rel_height=width_param_rel)[3]

# Flagging the realm beforehand outlined if the height is taken into account a spike (width beneath width_threshold)
for a, width, ext_a, ext_b in zip(vary(len(widths)), widths, widths_ext_a, widths_ext_b):
if width < width_threshold:
spikes[int(ext_a) - 1: int(ext_b) + 2] = 1

y_out = y.copy()

# Interpolation of corrupted factors
for i, spike in enumerate(spikes):
if spike != 0: # If we've got an spike in place i
window = np.arange(i - moving_average_window, i + moving_average_window + 1) # we choose 2 ma + 1 factors round our spike
window_exclude_spikes = window[spikes[window] == 0] # From such interval, we select those which aren't spikes
interpolator = interpolate.interp1d(window_exclude_spikes, y[window_exclude_spikes], type=interp_type) # We use the not corrupted factors across the spike to calculate the interpolation
y_out[i] = interpolator(i) # The corrupted level is exchanged by the interpolated worth.

return y_out

The perform with the algorithm can then be utilized to the spiked graphene spectrum as follows:

intensity_despiked = spike_removal(depth, 
width_threshold = 3,
prominence_threshold = 20,
moving_average_window=10,
width_param_rel=0.8,
interp_type='linear')

fig, ax = plt.subplots(1, 2, figsize = (2*5,3))
ax[0].plot(ramanshift, depth, label = 'spike', coloration ='crimson', linewidth = 0.9)
ax[0].plot(ramanshift, intensity_despiked)
ax[1].plot(ramanshift, intensity_despiked)
plt.present()

Instance of spike removing in a Raman spectrum. Picture by creator.

With this spike removing strategy, you may guarantee your Raman spectra are clear and dependable, minimizing artefacts with out shedding important spectral particulars. The strategy is good for automation, particularly if the anticipated minimal peak width is understood, making it extremely adaptable for large-scale spectral datasets and high-throughput evaluation

I hope you loved this tutorial. Be at liberty to drop any questions or share your personal Raman information challenges within the feedback — I’d love to listen to how this algorithm helps in your tasks!

Able to strive it out? Obtain the Jupyter Pocket book right here. And when you discovered this handy, please, bear in mind to quote the authentic work, that will assist me so much! 🙂