DAFxTRa08 Time-Scaling Evaluation

If you are interested in participating or in getting involved in the process of defining the time-scaling evaluation procedure, please join the DAFxTRa mailing list for an open discussion and introduce yourself in DafxtraPeople. Results of the discussion and organizational details of the evaluation will be posted here.

Evaluation Procedure

Transformation ratios

The time-scale transformation ratio indicates (if it is constant) the change in duration of the signal. The tests should be performed for different time-scaling ratios. From some people experience it seems most common ratios usually fall in the range from 70 to 130%. However, larger modifications might be as well desired, so we propose to evaluate 50 – 70 – 95 – 105 – 130 – 200 percentage ratios. The ones closer to 100 are interesting for comparing generic techniques to approaches specifically designed to work at small modification ratios, such as for cinema/video frame-rate conversion (24 to 25 and viceversa) like the one in Genesis Harmos system.

Relevant Aspects to consider

Below is the list proposed by Jordi Bonada of relevant aspects to be considered when evaluating audio time-scaling modifications

  • Presence of echoes: typical of time-domain techniques when a transient is repeated.
  • Transient smearing: typical of frequency-domain techniques when a transient is not detected or transient phase-locking is not applied.
  • Phasiness: typical of frequency-domain techniques where spectral peak region phase-locking is not applied.
  • Presence of pre-echoes: typical of frequency-domain techniques that apply phase-locking at transients. It is heard just before the transient.
  • Partial discontinuity: typical of time-domain methods applied to inharmonic or polyphonic signals, where a small time frame cannot explain all partials (i.e. quasi-stationary sinusoids) present in the signal.
  • Timbre distortion: artifact typical of frequency-domain techniques that don’t preserve harmonic phase relationships. Typical examples: trumpet with strong amplitude modulations, low pitch male voices, etc.
  • Timbre coloration: typical of frequency-domain signals applied to noise signals or ensembles. Typical examples: water falls, choirs, string ensembles.
  • Aural Image preservation: the auditory scene shouldn’t be changed by the time-scaling algorithm
  • Rhythmic preservation: rhythm irregularities are typical of time-domain approaches and more generally of techniques that apply adapted dynamic time-scaling ratios to reduce time-scaling modification of transient segments.
  • Flutter: typical of time domain techniques applied to polyphonic sounds. Typical example: string ensembles sound like modulated in amplitude.

Yizhar Lavner proposes to consider the following aspect as well: 

  • Preserving the quality (pitch, timbre) of a chorus singing in the background of a singer (or in other multipitch signals)
Objective tests

Ideally, as suggested by Julius Smith, matlab code should be written and published to compute objective measures related to the previous aspects. Probably they should be linked to specific synthetic signals generated for each considered aspect.

According to Axel Roebel, some of the described artifacts should be rather straightforward to measure like for example the phasiness (including timbre distortion). He points out that the main issue here seems to be that the synthetic signals should be as complex as possible so that we will not be measuring irrelevant results. For all what is related to an harmonic signal model he is confident that we can find a reasonable measure. Timbre coloration seems impossible though. Transient smearing may be difficult. A critical point is the perceptive relevance of what will be measured.

Yizhar Lavner proposes some ideas:

  • A test that might be applied is to reduce the time-scaled signal (of 200 percentage ratio) back to the original rate (using a 50 percentage ratio, applying one predefined algorithm) and computing some objective measure of the similarity between this signal (i.e. the successively time-scaled signal with factors of 2 and then with factor 0.5) and the original signal. This might be carried out for all the time-scaled signals produced by the algorithms participated in the contest.
  • Maybe it is possible to record a performer playing a piece with two rates, (for example, with normal and then with fast or slow rates) and to compare the performance of the algorithms in time-scaling the modified rates to the normal rate. (it might be difficult since the perfomer cannot preserve a fixed rate).
  • Maybe an objective criteria for transient smearing would be a measure of the energy envelope within some predefined bandwidth, compared to the original, where the comparison might be some similarity measure (for example the  cross-correlation coefficient) between the original envelope and the resampled envelope  of the time-scaled signal (where the resampling is to ensure that both envelopes are of the same length).
Subjective tests

As some people in the list have pointed out part of the evaluation goal should be the comparison of objective and subjective results. Therefore several subjective tests should be defined. Most if not all of the previous aspects should be included and some more added, such as for instance “fidelity to the original signal”.

For the subjective tests,  Yizhar Lavner thinks at least two tests might be appropriate for the contest:

  1. An "ABX test", in which two time-scaled signals are compared to the original signal, where the task of the listener is to judge which one of them is with higher fidelity to the original and with less artifacts, 
  2. A "MUSHRA"-like test, where all the time-scaled signals are played and the listeners could hear each one of them repeatedly, and have to judge the quality based on different criteria such as fidelity to the original, presence of echos and other artifacts, and grade each using a continuous scale with values between 0 and 100, (maybe with sliders). The listeners could compare the time-scaled signals to the original (known) signal. Some distractors should also be present, in which some of the artifacts could be perceived.
Audio material

The audio material to be used in the evaluation should be defined by the people in the DAFxTRa mailing list. In addition, it should come from the freesound website and contain two type of signals:

  • a set of synthetic signals associated to the matlab scripts (or generated by them) and uploaded to freesound.
  • a set of signals obtained from recordings or musical productions, available in freesound, which cover both solo and polyphonic cases.


The list of people who have shown interest to participate (in alphabetical order):

  • Jordi Bonada
  • Yizhar Lavner
  • Axel Roebel

If you plan to participate please contact jordi.bonada@iua.upf.edu

This wiki page is maintained by Jordi Bonada