Earlier than beginning any deep studying challenge with MIDI information, be sure you know the distinction between MIDI scores and MIDI performances!
This text is for folks planning or starting to work with MIDI information. This format is broadly used within the music group, and it caught the eye of laptop music researchers because of the availability of datasets.
Nevertheless, several types of info may be encoded in MIDI information. Specifically, there’s a massive distinction between MIDI scores and MIDI performances. Not being conscious of this ends in time wasted on a ineffective activity or an incorrect selection of coaching information and approaches.
I’ll present a fundamental introduction to the 2 codecs and provides hands-on examples of find out how to begin working with them in Python.
What’s MIDI?
MIDI was launched as a real-time communication protocol between synthesizers. The principle concept is to ship a message each time a notice is pressed (notice on) on a MIDI keyboard and one other message when the notice is launched (notice off). Then the synthesizer on the receiving finish will know what sound to provide.
Welcome to MIDI information!
If we acquire and save all these messages (ensuring so as to add their time place) then we’ve a MIDI file that we are able to use to breed a bit. Apart from note-on and note-off, many other forms of messages exist, for instance specifying pedal info or different controllers.
You may consider plotting this info with a pianoroll.
Beware, this isn’t a MIDI file, however solely a doable illustration of its content material! Some software program (on this instance Reaper) provides a small piano keyboard subsequent to the pianoroll to make it simpler to visually interpret.
How is a MIDI file created?
A MIDI file may be created primarily in two methods: 1) by taking part in on a MIDI instrument, 2) by manually writing right into a sequencer (Reaper, Cubase, GarageBand, Logic) or a musical rating editor (for instance from MuseScore).
With every method of manufacturing MIDI information comes additionally a distinct form of file:
- taking part in on a MIDI instrument → MIDI efficiency
- manually writing the notes (sequencer or musical rating) → MIDI rating
We’ll now dive into every kind, after which summarize their variations.
Earlier than beginning, a disclaimer: I cannot focus particularly on how the data is encoded, however on what info may be extracted from the file. For instance, after I say “ time is represented in seconds” it implies that we are able to get seconds, despite the fact that the encoding itself is extra advanced.
We will discover 4 sorts of knowledge in a MIDI efficiency:
- When the notice begin: notice onset
- When the notice finish: notice offset (or notice length computed as offset -onset)
- Which notice was performed: notice pitch
- How “sturdy” was the important thing pressed: notice velocity
Word onset and offset (and length) are represented in seconds, akin to the seconds the notes have been pressed and launched by the particular person taking part in the MIDI instrument.
Word pitch is encoded with an integer from 0 (lowest) to 127 (highest); notice that extra notes may be represented than these that may be performed by a piano; the piano vary corresponds to 21–108.
Word velocity can be encoded with an integer from 0 (silence) to 127 (most depth).
The overwhelming majority of MIDI performances are piano performances as a result of most MIDI devices are MIDI keyboards. Different MIDI devices (for instance MIDI saxophones, MIDI drums, and MIDI sensors for guitar) exist, however they aren’t as widespread.
The most important dataset of human MIDI performances (classical piano music) is the Maestro dataset by Google Magenta.
The principle property of MIDI performances
A basic attribute of MIDI performances is that there are by no means notes with precisely the identical onset or length (that is, in principle, doable however, in follow, extraordinarily unlikely).
Certainly, even when they actually strive, gamers gained’t be capable to press two (or extra) notes precisely on the identical time, since there’s a restrict to the precision people can acquire. The identical is true for notice durations. Furthermore, this isn’t even a precedence for many musicians, since time deviation may help to provide a extra expressive or groovy feeling. Lastly, consecutive notes could have some silence in between or partially overlap.
Because of this, MIDI performances are typically additionally known as unquantized MIDI. Temporal positions are unfold on a steady time scale, and never quantized to discrete positions (for digital encoding causes, it’s technically a discrete scale, however extraordinarily positive, thus we are able to think about it steady).
Fingers-on instance
Allow us to have a look at a MIDI efficiency. We are going to use the ASAP dataset, obtainable on GitHub.
In your favourite terminal (I’m utilizing PowerShell on Home windows), go to a handy location and clone the repository.
git clone https://github.com/fosfrancesco/asap-dataset
We can even use the Python library Partitura to open the MIDI information, so you may set up it in your Python atmosphere.
pip set up partitura
Now that all the pieces is about, let’s open the MIDI file, and print the primary 10 notes. Since this can be a MIDI efficiency, we are going to use the load_midi_performance
operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a efficiency, right here we use Bach Prelude BWV 848 in C#
performance_path = Path("Bach/Prelude/bwv_848/Denisova06M.mid")
print("Loading midi file: ", asap_basepath/performance_path)
# load the efficiency
efficiency = pt.load_performance_midi(asap_basepath/performance_path)
# extract the notice array
note_array = efficiency.note_array()
# print the dtype of the notice array (useful to know find out how to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(efficiency.note_array()[:10])
The output of this Python program ought to seem like this:
Numpy dtype:
[('onset_sec', '<f4'), ('duration_sec', '<f4'), ('onset_tick', '<i4'), ('duration_tick', '<i4'), ('pitch', '<i4'), ('velocity', '<i4'), ('track', '<i4'), ('channel', '<i4'), ('id', '<U256')]
First 10 notes:
[(1.0286459, 0.21354167, 790, 164, 49, 53, 0, 0, 'n0')
(1.03125 , 0.09765625, 792, 75, 77, 69, 0, 0, 'n1')
(1.1302084, 0.046875 , 868, 36, 73, 64, 0, 0, 'n2')
(1.21875 , 0.07942709, 936, 61, 68, 66, 0, 0, 'n3')
(1.3541666, 0.04166667, 1040, 32, 73, 34, 0, 0, 'n4')
(1.4361979, 0.0390625 , 1103, 30, 61, 62, 0, 0, 'n5')
(1.4361979, 0.04296875, 1103, 33, 77, 48, 0, 0, 'n6')
(1.5143229, 0.07421875, 1163, 57, 73, 69, 0, 0, 'n7')
(1.6380209, 0.06380209, 1258, 49, 78, 75, 0, 0, 'n8')
(1.6393229, 0.21484375, 1259, 165, 51, 54, 0, 0, 'n9')]
You may see that we’ve the onset and durations in seconds, pitch and velocity. Different fields are usually not so related for MIDI performances.
Onsets and durations are additionally represented in ticks. That is nearer to the precise method this info is encoded in a MIDI file: a really brief temporal length (= 1 tick) is chosen, and all temporal info is encoded as a a number of of this length. If you cope with music performances, you may usually ignore this info and use immediately the data in seconds.
You may confirm that there are by no means two notes with precisely the identical onset or the identical length!
Midi scores use a a lot richer set of MIDI messages to encode info equivalent to time signature, key signature, bar, and beat positions.
Because of this, they resemble musical scores (sheet music), despite the fact that they nonetheless miss some very important info, for instance, pitch spelling, ties, dots, rests, beams, and many others…
The temporal info will not be encoded in seconds however in additional musically summary items, like quarter notes.
The principle property of MIDI scores
A basic attribute of MIDI rating is that all notice onsets are aligned to a quantized grid, outlined first by bar positions after which by recursive integer divisions (primarily by 2 and three, however different divisions equivalent to 5,7,11, and many others…) are used for tuplets.
Fingers-on instance
We are actually going to take a look at the rating from Bach Prelude BWV 848 in C#, which is the rating of the efficiency we loaded earlier than. Partitura has a devoted load_score_midi
operate.
from pathlib import Path
import partitura as pt# set the trail to the asap dataset (change it to your native path!)
asap_basepath = Path('../asap-dataset/')
# choose a rating, right here we use Bach Prelude BWV 848 in C#
score_path = Path("Bach/Prelude/bwv_848/midi_score.mid")
print("Loading midi file: ", asap_basepath/score_path)
# load the rating
rating = pt.load_score_midi(asap_basepath/score_path)
# extract the notice array
note_array = rating.note_array()
# print the dtype of the notice array (useful to know find out how to interpret it)
print("Numpy dtype:")
print(note_array.dtype)
# print the primary 10 notes within the notice array
print("First 10 notes:")
print(rating.note_array()[:10])
The output of this Python program ought to seem like this:
Numpy dtype:
[('onset_beat', '<f4'), ('duration_beat', '<f4'), ('onset_quarter', '<f4'), ('duration_quarter', '<f4'), ('onset_div', '<i4'), ('duration_div', '<i4'), ('pitch', '<i4'), ('voice', '<i4'), ('id', '<U256'), ('divs_pq', '<i4')]
First 10 notes:
[(0. , 1.9958333 , 0. , 0.99791664, 0, 479, 49, 1, 'P01_n425', 480)
(0. , 0.49583334, 0. , 0.24791667, 0, 119, 77, 1, 'P00_n0', 480)
(0.5, 0.49583334, 0.25, 0.24791667, 120, 119, 73, 1, 'P00_n1', 480)
(1. , 0.49583334, 0.5 , 0.24791667, 240, 119, 68, 1, 'P00_n2', 480)
(1.5, 0.49583334, 0.75, 0.24791667, 360, 119, 73, 1, 'P00_n3', 480)
(2. , 0.99583334, 1. , 0.49791667, 480, 239, 61, 1, 'P01_n426', 480)
(2. , 0.49583334, 1. , 0.24791667, 480, 119, 77, 1, 'P00_n4', 480)
(2.5, 0.49583334, 1.25, 0.24791667, 600, 119, 73, 1, 'P00_n5', 480)
(3. , 1.9958333 , 1.5 , 0.99791664, 720, 479, 51, 1, 'P01_n427', 480)
(3. , 0.49583334, 1.5 , 0.24791667, 720, 119, 78, 1, 'P00_n6', 480)]
You may see that the onsets of the notes are all falling precisely on a grid. If we think about onset_quarter
(the third column) we are able to see that sixteenth notes fall each 0.25 quarters, as anticipated.
The length is a little more problematic. For instance, on this rating, a sixteenth notice ought to have a quarter_duration
of 0.25. Nevertheless, we are able to see from the Python output that the length is definitely 0.24791667. What occurred is that MuseScore, which was used to generate this MIDI file, shortened a bit every notice. Why? Simply to make the audio rendition of this MIDI file sound a bit higher. And it does certainly, at the price of inflicting many issues to the folks utilizing these information for Laptop Music analysis. Comparable issues additionally exist in broadly used datasets, such because the Lakh MIDI Dataset.
Given the variations between MIDI scores and MIDI performances we’ve seen, let me provide you with some generic tips that may assist in appropriately organising your deep studying system.
Choose MIDI scores for music technology techniques, for the reason that quantized notice positions may be represented with a fairly small vocabulary, and different simplifications are doable, like solely contemplating monophonic melodies.
Use MIDI efficiency for techniques that focus on the way in which people play and understand music, for instance, beat monitoring techniques, tempo estimators, and emotion recognition techniques (specializing in expressive taking part in).
Use each sorts of information for duties like score-following (enter: efficiency, output: rating) and expressive efficiency technology (enter: rating, output: efficiency).
Additional issues
I’ve introduced the principle variations between MIDI scores and MIDI performances. Nevertheless, as typically occurs, issues could also be extra advanced.
For instance, some datasets, just like the AMAPS datasets, are initially MIDI scores, however the authors launched time modifications at each notice, to simulate the time deviation of actual human gamers (notice that this solely occurs between notes at totally different time positions; all notes in a chord will nonetheless be completely simultaneous).
Furthermore, some MIDI exports, just like the one from MuseScore, can even attempt to make the MIDI rating extra much like a MIDI efficiency, once more by altering tempo indication if the piece modifications tempo, by inserting a really small silence between consecutive notes (we noticed this within the instance earlier than), and by taking part in grace notes as a really brief notice barely earlier than the reference notice onset.
Certainly, grace notes represent a really annoying drawback in MIDI scores. Their length is unspecified in musical phrases, we simply generically know that they need to be “brief”. And their onset is within the rating the identical one of many reference notice, however this could sound very bizarre if we listed to an audio rendition of the MIDI file. Ought to we then shorten the earlier notice, or the following notice to create space for the grace notice?
Different gildings are additionally problematic since there are not any distinctive guidelines on find out how to play them, for instance, what number of notes ought to a trill incorporates? Ought to a mordent begin from the precise notice or the higher notice?
MIDI information are nice, as a result of they explicitly present details about the pitch, onset, and length of each notice. This implies for instance that, in comparison with audio information, fashions concentrating on MIDI information may be smaller and be educated with smaller datasets.
This comes at a value: MIDI information, and symbolically encoded music usually, are advanced codecs to make use of since they encode so many sorts of knowledge in many various methods.
To correctly use MIDI information as coaching information, you will need to concentrate on the form of information which can be encoded. I hope this text gave you place to begin to be taught extra about this matter!
[All figures are from the author.]