Content-Based Audio Retrieval (libfmp.c7)

The FMP notebooks provide detailed textbook-like explanations of central techniques and algorithms implemented in the libfmp. The part of FMP related to this module is available at the following URL:

https://www.audiolabs-erlangen.de/resources/MIR/FMP/C7/C7.html

libfmp.c7.c7s1_audio_id.compute_constellation_map(Y, dist_freq=7, dist_time=7, thresh=0.01)[source]

Compute constellation map (implementation using image processing)

Notebook: C7/C7S1_AudioIdentification.ipynb

Parameters
  • Y (np.ndarray) – Spectrogram (magnitude)

  • dist_freq (int) – Neighborhood parameter for frequency direction (kappa) (Default value = 7)

  • dist_time (int) – Neighborhood parameter for time direction (tau) (Default value = 7)

  • thresh (float) – Threshold parameter for minimal peak magnitude (Default value = 0.01)

Returns

Cmap (np.ndarray) – Boolean mask for peak structure (same size as Y)

libfmp.c7.c7s1_audio_id.compute_constellation_map_naive(Y, dist_freq=7, dist_time=7, thresh=0.01)[source]

Compute constellation map (naive implementation)

Notebook: C7/C7S1_AudioIdentification.ipynb

Parameters
  • Y (np.ndarray) – Spectrogram (magnitude)

  • dist_freq (int) – Neighborhood parameter for frequency direction (kappa) (Default value = 7)

  • dist_time (int) – Neighborhood parameter for time direction (tau) (Default value = 7)

  • thresh (float) – Threshold parameter for minimal peak magnitude (Default value = 0.01)

Returns

Cmap (np.ndarray) – Boolean mask for peak structure (same size as Y)

libfmp.c7.c7s1_audio_id.compute_matching_function(C_D, C_Q, tol_freq=1, tol_time=1)[source]

Computes matching function for constellation maps

Notebook: C7/C7S1_AudioIdentification.ipynb

Parameters
  • C_D (np.ndarray) – Binary matrix used as dababase document

  • C_Q (np.ndarray) – Binary matrix used as query document

  • tol_freq (int) – Tolerance in frequency direction (vertical) (Default value = 1)

  • tol_time (int) – Tolerance in time direction (horizontal) (Default value = 1)

Returns
  • Delta (np.ndarray) – Matching function

  • shift_max (int) – Optimal shift position maximizing Delta

libfmp.c7.c7s1_audio_id.match_binary_matrices_tol(C_ref, C_est, tol_freq=0, tol_time=0)[source]
Compare binary matrices with tolerance
Note: The tolerance parameters should be smaller than the minimum distance of peaks (1-entries in C_ref ad C_est) to obtain meaningful TP, FN, FP values

Notebook: C7/C7S1_AudioIdentification.ipynb

Parameters
  • C_ref (np.ndarray) – Binary matrix used as reference

  • C_est (np.ndarray) – Binary matrix used as estimation

  • tol_freq (int) – Tolerance in frequency direction (vertical) (Default value = 0)

  • tol_time (int) – Tolerance in time direction (horizontal) (Default value = 0)

Returns
  • TP (int) – True positives

  • FN (int) – False negatives

  • FP (int) – False positives

  • C_AND (np.ndarray) – Boolean mask of AND of C_ref and C_est (with tolerance)

libfmp.c7.c7s1_audio_id.plot_constellation_map(Cmap, Y=None, xlim=None, ylim=None, title='', xlabel='Time (sample)', ylabel='Frequency (bins)', s=5, color='r', marker='o', figsize=(7, 3), dpi=72)[source]

Plot constellation map

Notebook: C7/C7S1_AudioIdentification.ipynb

Parameters
  • Cmap – Constellation map given as boolean mask for peak structure

  • Y – Spectrogram representation (Default value = None)

  • xlim – Limits for x-axis (Default value = None)

  • ylim – Limits for y-axis (Default value = None)

  • title – Title for plot (Default value = ‘’)

  • xlabel – Label for x-axis (Default value = ‘Time (sample)’)

  • ylabel – Label for y-axis (Default value = ‘Frequency (bins)’)

  • s – Size of dots in scatter plot (Default value = 5)

  • color – Color used for scatter plot (Default value = ‘r’)

  • marker – Marker for peaks (Default value = ‘o’)

  • figsize – Width, height in inches (Default value = (7, 3))

  • dpi – Dots per inch (Default value = 72)

Returns
  • fig – The created matplotlib figure

  • ax – The used axes.

  • im – The image plot

libfmp.c7.c7s2_audio_matching.compute_accumulated_cost_matrix_subsequence_dtw(C)[source]

Given the cost matrix, compute the accumulated cost matrix for subsequence dynamic time warping with step sizes {(1, 0), (0, 1), (1, 1)}

Notebook: C7/C7S2_SubsequenceDTW.ipynb

Parameters

C (np.ndarray) – Cost matrix

Returns

D (np.ndarray) – Accumulated cost matrix

libfmp.c7.c7s2_audio_matching.compute_accumulated_cost_matrix_subsequence_dtw_21(C)[source]

Given the cost matrix, compute the accumulated cost matrix for subsequence dynamic time warping with step sizes {(1, 1), (2, 1), (1, 2)}

Notebook: C7/C7S2_SubsequenceDTW.ipynb

Parameters

C (np.ndarray) – Cost matrix

Returns

D (np.ndarray) – Accumulated cost matrix

libfmp.c7.c7s2_audio_matching.compute_cens_from_chromagram(C, Fs=1, ell=41, d=10, quant=True)[source]

Compute CENS features from chromagram

Notebook: C7/C7S2_CENS.ipynb

Parameters
  • C (np.ndarray) – Input chromagram

  • Fs (scalar) – Feature rate of chromagram (Default value = 1)

  • ell (int) – Smoothing length (Default value = 41)

  • d (int) – Downsampling factor (Default value = 10)

  • quant (bool) – Apply quantization (Default value = True)

Returns
  • C_CENS (np.ndarray) – CENS features

  • Fs_CENS (scalar) – Feature rate of CENS features

libfmp.c7.c7s2_audio_matching.compute_cens_from_file(fn_wav, Fs=22050, N=4410, H=2205, ell=21, d=5)[source]

Compute CENS features from file

Notebook: C7/C7S2_AudioMatching.ipynb

Parameters
  • fn_wav (str) – Filename of wav file

  • Fs (scalar) – Feature rate of wav file (Default value = 22050)

  • N (int) – Window size for STFT (Default value = 4410)

  • H (int) – Hop size for STFT (Default value = 2205)

  • ell (int) – Smoothing length (Default value = 21)

  • d (int) – Downsampling factor (Default value = 5)

Returns
  • X_CENS (np.ndarray) – CENS features

  • L (int) – Length of CENS feature sequence

  • Fs_CENS (scalar) – Feature rate of CENS features

  • x_duration (float) – Duration (seconds) of wav file

libfmp.c7.c7s2_audio_matching.compute_matching_function_dtw(X, Y, stepsize=2)[source]

Compute CENS features from file

Notebook: C7/C7S2_AudioMatching.ipynb

Parameters
  • X (np.ndarray) – Query feature sequence (given as K x N matrix)

  • Y (np.ndarray) – Database feature sequence (given as K x M matrix)

  • stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)

Returns
  • Delta (np.ndarray) – DTW-based matching function

  • C (np.ndarray) – Cost matrix

  • D (np.ndarray) – Accumulated cost matrix

libfmp.c7.c7s2_audio_matching.compute_matching_function_dtw_ti(X, Y, cyc=np.arange(12), stepsize=2)[source]

Compute transposition-invariant matching function

Notebook: C7/C7S2_AudioMatching.ipynb

Parameters
  • X (np.ndarray) – Query feature sequence (given as K x N matrix)

  • Y (np.ndarray) – Database feature sequence (given as K x M matrix)

  • cyc (np.nda(rray) – Set of cyclic shift indices to be considered (Default value = np.arange(12))

  • stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)

Returns
  • Delta_TI (np.ndarray) – Transposition-invariant matching function

  • Delta_ind (np.ndarray) – Cost-minimizing indices

  • Delta_cyc (np.ndarray) – Array containing all matching functions

libfmp.c7.c7s2_audio_matching.compute_optimal_warping_path_subsequence_dtw(D, m=- 1)[source]

Given an accumulated cost matrix, compute the warping path for subsequence dynamic time warping with step sizes {(1, 0), (0, 1), (1, 1)}

Notebook: C7/C7S2_SubsequenceDTW.ipynb

Parameters
  • D (np.ndarray) – Accumulated cost matrix

  • m (int) – Index to start back tracking; if set to -1, optimal m is used (Default value = -1)

Returns

P (np.ndarray) – Optimal warping path (array of index pairs)

libfmp.c7.c7s2_audio_matching.compute_optimal_warping_path_subsequence_dtw_21(D, m=- 1)[source]

Given an accumulated cost matrix, compute the warping path for subsequence dynamic time warping with step sizes {(1, 1), (2, 1), (1, 2)}

Notebook: C7/C7S2_SubsequenceDTW.ipynb

Parameters
  • D (np.ndarray) – Accumulated cost matrix

  • m (int) – Index to start back tracking; if set to -1, optimal m is used (Default value = -1)

Returns

P (np.ndarray) – Optimal warping path (array of index pairs)

libfmp.c7.c7s2_audio_matching.cost_matrix_dot(X, Y)[source]

Computes cost matrix via dot product

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • X (np.ndarray) – First sequence (K x N matrix)

  • Y (np.ndarray) – Second sequence (K x M matrix)

Returns

C (np.ndarray) – Cost matrix

libfmp.c7.c7s2_audio_matching.matches_diag(pos, Delta_N)[source]

Derives matches from positions in the case of diagonal matching

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • pos (np.ndarray or list) – Starting positions of matches

  • Delta_N (int or np.ndarray or list) – Length of match (a single number or a list of same length as Delta)

Returns

matches (np.ndarray) – Array containing matches (start, end)

libfmp.c7.c7s2_audio_matching.matches_dtw(pos, D, stepsize=2)[source]

Derives matches from positions for DTW-based strategy

Notebook: C7/C7S2_AudioMatching.ipynb

Parameters
  • pos (np.ndarray) – End positions of matches

  • D (np.ndarray) – Accumulated cost matrix

  • stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)

Returns

matches (np.ndarray) – Array containing matches (start, end)

libfmp.c7.c7s2_audio_matching.matching_function_diag(C, cyclic=False)[source]

Computes diagonal matching function

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • C (np.ndarray) – Cost matrix

  • cyclic (bool) – If “True” then matching is done cyclically (Default value = False)

Returns

Delta (np.ndarray) – Matching function

libfmp.c7.c7s2_audio_matching.matching_function_diag_multiple(X, Y, tempo_rel_set=[1], cyclic=False)[source]

Computes diagonal matching function using multiple query strategy

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • X (np.ndarray) – First sequence (K x N matrix)

  • Y (np.ndarray) – Second sequence (K x M matrix)

  • tempo_rel_set (np.ndarray) – Set of relative tempo values (scaling) (Default value = [1])

  • cyclic (bool) – If “True” then matching is done cyclically (Default value = False)

Returns
  • Delta_min (np.ndarray) – Matching function (obtained by from minimizing over several matching functions)

  • Delta_N (np.ndarray) – Query length of best match for each time position

  • Delta_scale (np.ndarray) – Set of matching functions (for each of the scaled versions of the query)

libfmp.c7.c7s2_audio_matching.mininma_from_matching_function(Delta, rho=2, tau=0.2, num=None)[source]

Derives local minima positions of matching function in an iterative fashion

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • Delta (np.ndarray) – Matching function

  • rho (int) – Parameter to exclude neighborhood of a matching position for subsequent matches (Default value = 2)

  • tau (float) – Threshold for maximum Delta value allowed for matches (Default value = 0.2)

  • num (int) – Maximum number of matches (Default value = None)

Returns

pos (np.ndarray) – Array of local minima

libfmp.c7.c7s2_audio_matching.plot_matches(ax, matches, Delta, Fs=1, alpha=0.2, color='r', s_marker='o', t_marker='')[source]

Plots matches into existing axis

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • ax – Axis

  • matches – Array of matches (start, end)

  • Delta – Matching function

  • Fs – Feature rate (Default value = 1)

  • alpha – Transparency pramaeter for match visualization (Default value = 0.2)

  • color – Color used to indicated matches (Default value = ‘r’)

  • s_marker – Marker used to indicate start of matches (Default value = ‘o’)

  • t_marker – Marker used to indicate end of matches (Default value = ‘’)

libfmp.c7.c7s2_audio_matching.quantize_matrix(C, quant_fct=None)[source]

Quantize matrix values in a logarithmic manner (as done for CENS features)

Notebook: C7/C7S2_CENS.ipynb

Parameters
  • C (np.ndarray) – Input matrix

  • quant_fct (list) – List specifying the quantization function (Default value = None)

Returns

C_quant (np.ndarray) – Output matrix

libfmp.c7.c7s2_audio_matching.scale_tempo_sequence(X, factor=1)[source]

Scales a sequence (given as feature matrix) along time (second dimension)

Notebook: C7/C7S2_DiagonalMatching.ipynb

Parameters
  • X (np.ndarray) – Feature sequences (given as K x N matrix)

  • factor (float) – Scaling factor (resulting in length “round(factor * N)””) (Default value = 1)

Returns
  • X_new (np.ndarray) – Scaled feature sequence

  • N_new (int) – Length of scaled feature sequence

libfmp.c7.c7s3_version_id.compute_accumulated_score_matrix_common_subsequence(S)[source]

Given the score matrix, compute the accumulated score matrix for common subsequence matching with step sizes {(1, 0), (0, 1), (1, 1)}

Notebook: C7/C7S3_CommonSubsequence.ipynb

Parameters

S (np.ndarray) – Score matrix

Returns

D (np.ndarray) – Accumulated score matrix

libfmp.c7.c7s3_version_id.compute_optimal_path_common_subsequence(D, cellmax=True, n=0, m=0)[source]

Given an accumulated score matrix, compute the score-maximizing path for common subsequence matching with step sizes {(1, 0), (0, 1), (1, 1)}

Notebook: C7/C7S3_CommonSubsequence.ipynb

Parameters
  • D (np.ndarray) – Accumulated score matrix

  • cellmax (bool) – If “True”, score-maximizing cell will be computed (Default value = True)

  • n (int) – Index (first axis) of cell for backtracking start; only used when cellmax=False (Default value = 0)

  • m (int) – Index (second axis) of cell for backtracking start; only used when cellmax=False (Default value = 0)

Returns

P (np.ndarray) – Score-maximizing path (array of index pairs)

libfmp.c7.c7s3_version_id.compute_partial_matching(S)[source]

Given the score matrix, compute the accumulated score matrix for partial matching

Notebook: C7/C7S3_CommonSubsequence.ipynb

Parameters

S (np.ndarray) – Score matrix

Returns
  • D (np.ndarray) – Accumulated score matrix

  • P (np.ndarray) – Partial match (array of index pairs)

libfmp.c7.c7s3_version_id.compute_prf_metrics(I, score, I_Q)[source]

Compute precision, recall, F-measures and other evaluation metrics for document-level retrieval

Notebook: C7/C7S3_Evaluation.ipynb

Parameters
  • I (np.ndarray) – Array of items

  • score (np.ndarray) – Array containing the score values of the times

  • I_Q (np.ndarray) – Array of relevant (positive) items

Returns
  • P_Q (float) – Precision

  • R_Q (float) – Recall

  • F_Q (float) – F-measures sorted by rank

  • BEP (float) – Break-even point

  • F_max (float) – Maximal F-measure

  • P_average (float) – Mean average

  • X_Q (np.ndarray) – Relevance function

  • rank (np.ndarray) – Array of rank values

  • I_sorted (np.ndarray) – Array of items sorted by rank

  • rank_sorted (np.ndarray) – Array of rank values sorted by rank

libfmp.c7.c7s3_version_id.compute_sm_from_wav(x1, x2, Fs, N=4410, H=2205, ell=21, d=5, L_smooth=12, tempo_rel_set=np.array([0.66, 0.81, 1, 1.22, 1.5]), shift_set=np.array([0]), strategy='relative', scale=True, thresh=0.15, penalty=- 2.0, binarize=False)[source]

Compute a similarity matrix (SM)

Notebook: C7/C7S3_VersionIdentification.ipynb

Parameters
  • x1 (np.ndarray) – First signal

  • x2 (np.ndarray) – Second signal

  • Fs (scalar) – Sampling rate of WAV files

  • N (int) – Window size for computing STFT-based chroma features (Default value = 4410)

  • H (int) – Hop size for computing STFT-based chroma features (Default value = 2205)

  • ell (int) – Smoothing length for computing CENS features (Default value = 21)

  • d (int) – Downsampling factor for computing CENS features (Default value = 5)

  • L_smooth (int) – Length of filter for enhancing SM (Default value = 12)

  • tempo_rel_set (np.ndarray) – Set of relative tempo values for enhancing SM (Default value = np.array([0.66, 0.81, 1, 1.22, 1.5]))

  • shift_set (np.ndarray) – Set of shift indices for enhancing SM (Default value = np.array([0]))

  • strategy (str) – Thresholding strategy for thresholding SM (‘absolute’, ‘relative’, ‘local’) (Default value = ‘relative’)

  • scale (bool) – If scale=True, then scaling of positive values to range [0,1] for thresholding SM (Default value = True)

  • thresh (float) – Treshold (meaning depends on strategy) (Default value = 0.15)

  • penalty (float) – Set values below treshold to value specified (Default value = -2.0)

  • binarize (bool) – Binarizes final matrix (positive: 1; otherwise: 0) (Default value = False)

Returns
  • X (np.ndarray) – CENS feature sequence for first signal

  • Y (np.ndarray) – CENS feature sequence for second signal

  • Fs_feature (scalar) – Feature rate

  • S_thresh (np.ndarray) – Similarity matrix

  • I (np.ndarray) – Index matrix

libfmp.c7.c7s3_version_id.get_induced_segments(P)[source]

Given a path, compute the induces segments

Notebook: C7/C7S3_CommonSubsequence.ipynb

Parameters

P (np.ndarray) – Path (list of index pairs)

Returns
  • seg_X (np.ndarray) – Induced segment of first sequence

  • seg_Y (np.ndarray) – Induced segment of second sequence