Chapter 3. Fourier Analysis and Experiments
Section 3.7.3. Example: STFT, Filter, Inverse STFT

Section 3.7.4. Phase Spectrogram of STFT
Spectrogram 1

Spectrogram 2

Chapter 4. Azimuth Discrimination and Resynthesis (ADRess)
Section 4.7. Experiments
The input signal is a four-note melody with 4 sources of the synthetic clarinet. All notes are one second long.
-
Left ChannelStereo MixRight Channel
-
TruthRecoveredSopranoAltoTenorBass
-
Spectrograms
The input signal is a four-note melody with 4 sources of the synthetic clarinet. All notes are one second long.
-
Left ChannelStereo MixRight Channel
-
TruthRecoveredMelodyAltoBass
-
Spectrograms
Extra. Alienesque background noise
Alienesque noises can be heard more clearly when extracting an azimuth subspace with no dominant source. This noise is still present in normal extractions when the subspace is chosen to be too wide.
Chapter 5. Evaluation
Source Separation Demo on the URMP Dataset
The first 15 seconds of four pieces from the URMP dataset are presented here. All demos feature strings, which allows for a comparison in the separation quality.
Piece 1 (Two Instruments)
Piece 12 (Three Instruments)
Piece 24 (Four Instruments)
Piece 39 (Five Instruments)
Transcription Demo on the URMP Dataset
The separated tracks were then individually transcribed using AnthemScore. In the following examples, tracks with time alignment issues are marked with an asterisk (*) in the score. These tracks have been manually aligned in order for the transcription to sound more uniform.
Piece 1 (Two Instruments)
Piece 12 (Three Instruments)
Piece 24 (Four Instruments)
Piece 39 (Five Instruments)
Section 5.4. Qualitative Results and Discussion
Some quality issues in the separation, the URMP dataset, and the transcriptions were pointed out in this section in the report. Selected sections of the mentioned examples are presented here.
-
Subspace width
The timbre components (harmonic overtones) from the other parts has seeped into the extraction of the second flute in Piece 30 at 50 seconds due to an overly wide subspace width H.
-
Same note from different instruments
When two instruments play the same note (not necessarily at the same octave), their harmonic overtones overlap. Thus, the same frequency bands have energy at different azimuth positions, causing the nulltoe shift between them. This happens most noticeably in Piece 40, at 20 seconds: the oboe plays a C4 quietly, whereas the bassoon plays a C3 loudly. Consequently, this section in the recovered bassoon sounds somewhat muted due to this extra energy in its harmonic overtone C4. In contrast, this melody in the extracted oboe track sounds as quiet as the background noise. Immediately after this, at 21 seconds, all parts are muted except the oboe, and thus it is clearly extracted and demonstrates a big contrast to the previously muffled sound.
OboeBassoon -
Trills
Trills in the dataset are explicitly written out as sixteen triplets (for example, in Piece 4).
-
Expressive Timings
Expressive timings were not denoted in the music score but were performed. For example, the ritardando ending in Piece 10 or the gradual accelerando in Piece 26.
Piece 10Piece 26 -
Performance Errors
There were performance errors, such as the trombone in Piece 33 at 27 seconds.