Content-Based Audio Retrieval (libfmp.c7)¶
The FMP notebooks provide detailed textbook-like explanations of central techniques and algorithms implemented in the libfmp. The part of FMP related to this module is available at the following URL:
https://www.audiolabs-erlangen.de/resources/MIR/FMP/C7/C7.html
- libfmp.c7.c7s1_audio_id.compute_constellation_map(Y, dist_freq=7, dist_time=7, thresh=0.01)[source]¶
Compute constellation map (implementation using image processing)
Notebook: C7/C7S1_AudioIdentification.ipynb
- Parameters
Y (np.ndarray) – Spectrogram (magnitude)
dist_freq (int) – Neighborhood parameter for frequency direction (kappa) (Default value = 7)
dist_time (int) – Neighborhood parameter for time direction (tau) (Default value = 7)
thresh (float) – Threshold parameter for minimal peak magnitude (Default value = 0.01)
- Returns
Cmap (np.ndarray) – Boolean mask for peak structure (same size as Y)
- libfmp.c7.c7s1_audio_id.compute_constellation_map_naive(Y, dist_freq=7, dist_time=7, thresh=0.01)[source]¶
Compute constellation map (naive implementation)
Notebook: C7/C7S1_AudioIdentification.ipynb
- Parameters
Y (np.ndarray) – Spectrogram (magnitude)
dist_freq (int) – Neighborhood parameter for frequency direction (kappa) (Default value = 7)
dist_time (int) – Neighborhood parameter for time direction (tau) (Default value = 7)
thresh (float) – Threshold parameter for minimal peak magnitude (Default value = 0.01)
- Returns
Cmap (np.ndarray) – Boolean mask for peak structure (same size as Y)
- libfmp.c7.c7s1_audio_id.compute_matching_function(C_D, C_Q, tol_freq=1, tol_time=1)[source]¶
Computes matching function for constellation maps
Notebook: C7/C7S1_AudioIdentification.ipynb
- Parameters
C_D (np.ndarray) – Binary matrix used as dababase document
C_Q (np.ndarray) – Binary matrix used as query document
tol_freq (int) – Tolerance in frequency direction (vertical) (Default value = 1)
tol_time (int) – Tolerance in time direction (horizontal) (Default value = 1)
- Returns
Delta (np.ndarray) – Matching function
shift_max (int) – Optimal shift position maximizing Delta
- libfmp.c7.c7s1_audio_id.match_binary_matrices_tol(C_ref, C_est, tol_freq=0, tol_time=0)[source]¶
- Compare binary matrices with toleranceNote: The tolerance parameters should be smaller than the minimum distance of peaks (1-entries in C_ref ad C_est) to obtain meaningful TP, FN, FP values
Notebook: C7/C7S1_AudioIdentification.ipynb
- Parameters
C_ref (np.ndarray) – Binary matrix used as reference
C_est (np.ndarray) – Binary matrix used as estimation
tol_freq (int) – Tolerance in frequency direction (vertical) (Default value = 0)
tol_time (int) – Tolerance in time direction (horizontal) (Default value = 0)
- Returns
TP (int) – True positives
FN (int) – False negatives
FP (int) – False positives
C_AND (np.ndarray) – Boolean mask of AND of C_ref and C_est (with tolerance)
- libfmp.c7.c7s1_audio_id.plot_constellation_map(Cmap, Y=None, xlim=None, ylim=None, title='', xlabel='Time (sample)', ylabel='Frequency (bins)', s=5, color='r', marker='o', figsize=(7, 3), dpi=72)[source]¶
Plot constellation map
Notebook: C7/C7S1_AudioIdentification.ipynb
- Parameters
Cmap – Constellation map given as boolean mask for peak structure
Y – Spectrogram representation (Default value = None)
xlim – Limits for x-axis (Default value = None)
ylim – Limits for y-axis (Default value = None)
title – Title for plot (Default value = ‘’)
xlabel – Label for x-axis (Default value = ‘Time (sample)’)
ylabel – Label for y-axis (Default value = ‘Frequency (bins)’)
s – Size of dots in scatter plot (Default value = 5)
color – Color used for scatter plot (Default value = ‘r’)
marker – Marker for peaks (Default value = ‘o’)
figsize – Width, height in inches (Default value = (7, 3))
dpi – Dots per inch (Default value = 72)
- Returns
fig – The created matplotlib figure
ax – The used axes.
im – The image plot
- libfmp.c7.c7s2_audio_matching.compute_accumulated_cost_matrix_subsequence_dtw(C)[source]¶
Given the cost matrix, compute the accumulated cost matrix for subsequence dynamic time warping with step sizes {(1, 0), (0, 1), (1, 1)}
Notebook: C7/C7S2_SubsequenceDTW.ipynb
- Parameters
C (np.ndarray) – Cost matrix
- Returns
D (np.ndarray) – Accumulated cost matrix
- libfmp.c7.c7s2_audio_matching.compute_accumulated_cost_matrix_subsequence_dtw_21(C)[source]¶
Given the cost matrix, compute the accumulated cost matrix for subsequence dynamic time warping with step sizes {(1, 1), (2, 1), (1, 2)}
Notebook: C7/C7S2_SubsequenceDTW.ipynb
- Parameters
C (np.ndarray) – Cost matrix
- Returns
D (np.ndarray) – Accumulated cost matrix
- libfmp.c7.c7s2_audio_matching.compute_cens_from_chromagram(C, Fs=1, ell=41, d=10, quant=True)[source]¶
Compute CENS features from chromagram
Notebook: C7/C7S2_CENS.ipynb
- Parameters
C (np.ndarray) – Input chromagram
Fs (scalar) – Feature rate of chromagram (Default value = 1)
ell (int) – Smoothing length (Default value = 41)
d (int) – Downsampling factor (Default value = 10)
quant (bool) – Apply quantization (Default value = True)
- Returns
C_CENS (np.ndarray) – CENS features
Fs_CENS (scalar) – Feature rate of CENS features
- libfmp.c7.c7s2_audio_matching.compute_cens_from_file(fn_wav, Fs=22050, N=4410, H=2205, ell=21, d=5)[source]¶
Compute CENS features from file
Notebook: C7/C7S2_AudioMatching.ipynb
- Parameters
fn_wav (str) – Filename of wav file
Fs (scalar) – Feature rate of wav file (Default value = 22050)
N (int) – Window size for STFT (Default value = 4410)
H (int) – Hop size for STFT (Default value = 2205)
ell (int) – Smoothing length (Default value = 21)
d (int) – Downsampling factor (Default value = 5)
- Returns
X_CENS (np.ndarray) – CENS features
L (int) – Length of CENS feature sequence
Fs_CENS (scalar) – Feature rate of CENS features
x_duration (float) – Duration (seconds) of wav file
- libfmp.c7.c7s2_audio_matching.compute_matching_function_dtw(X, Y, stepsize=2)[source]¶
Compute CENS features from file
Notebook: C7/C7S2_AudioMatching.ipynb
- Parameters
X (np.ndarray) – Query feature sequence (given as K x N matrix)
Y (np.ndarray) – Database feature sequence (given as K x M matrix)
stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)
- Returns
Delta (np.ndarray) – DTW-based matching function
C (np.ndarray) – Cost matrix
D (np.ndarray) – Accumulated cost matrix
- libfmp.c7.c7s2_audio_matching.compute_matching_function_dtw_ti(X, Y, cyc=np.arange(12), stepsize=2)[source]¶
Compute transposition-invariant matching function
Notebook: C7/C7S2_AudioMatching.ipynb
- Parameters
X (np.ndarray) – Query feature sequence (given as K x N matrix)
Y (np.ndarray) – Database feature sequence (given as K x M matrix)
cyc (np.nda(rray) – Set of cyclic shift indices to be considered (Default value = np.arange(12))
stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)
- Returns
Delta_TI (np.ndarray) – Transposition-invariant matching function
Delta_ind (np.ndarray) – Cost-minimizing indices
Delta_cyc (np.ndarray) – Array containing all matching functions
- libfmp.c7.c7s2_audio_matching.compute_optimal_warping_path_subsequence_dtw(D, m=- 1)[source]¶
Given an accumulated cost matrix, compute the warping path for subsequence dynamic time warping with step sizes {(1, 0), (0, 1), (1, 1)}
Notebook: C7/C7S2_SubsequenceDTW.ipynb
- Parameters
D (np.ndarray) – Accumulated cost matrix
m (int) – Index to start back tracking; if set to -1, optimal m is used (Default value = -1)
- Returns
P (np.ndarray) – Optimal warping path (array of index pairs)
- libfmp.c7.c7s2_audio_matching.compute_optimal_warping_path_subsequence_dtw_21(D, m=- 1)[source]¶
Given an accumulated cost matrix, compute the warping path for subsequence dynamic time warping with step sizes {(1, 1), (2, 1), (1, 2)}
Notebook: C7/C7S2_SubsequenceDTW.ipynb
- Parameters
D (np.ndarray) – Accumulated cost matrix
m (int) – Index to start back tracking; if set to -1, optimal m is used (Default value = -1)
- Returns
P (np.ndarray) – Optimal warping path (array of index pairs)
- libfmp.c7.c7s2_audio_matching.cost_matrix_dot(X, Y)[source]¶
Computes cost matrix via dot product
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
X (np.ndarray) – First sequence (K x N matrix)
Y (np.ndarray) – Second sequence (K x M matrix)
- Returns
C (np.ndarray) – Cost matrix
- libfmp.c7.c7s2_audio_matching.matches_diag(pos, Delta_N)[source]¶
Derives matches from positions in the case of diagonal matching
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
pos (np.ndarray or list) – Starting positions of matches
Delta_N (int or np.ndarray or list) – Length of match (a single number or a list of same length as Delta)
- Returns
matches (np.ndarray) – Array containing matches (start, end)
- libfmp.c7.c7s2_audio_matching.matches_dtw(pos, D, stepsize=2)[source]¶
Derives matches from positions for DTW-based strategy
Notebook: C7/C7S2_AudioMatching.ipynb
- Parameters
pos (np.ndarray) – End positions of matches
D (np.ndarray) – Accumulated cost matrix
stepsize (int) – Parameter for step size condition (1 or 2) (Default value = 2)
- Returns
matches (np.ndarray) – Array containing matches (start, end)
- libfmp.c7.c7s2_audio_matching.matching_function_diag(C, cyclic=False)[source]¶
Computes diagonal matching function
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
C (np.ndarray) – Cost matrix
cyclic (bool) – If “True” then matching is done cyclically (Default value = False)
- Returns
Delta (np.ndarray) – Matching function
- libfmp.c7.c7s2_audio_matching.matching_function_diag_multiple(X, Y, tempo_rel_set=[1], cyclic=False)[source]¶
Computes diagonal matching function using multiple query strategy
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
X (np.ndarray) – First sequence (K x N matrix)
Y (np.ndarray) – Second sequence (K x M matrix)
tempo_rel_set (np.ndarray) – Set of relative tempo values (scaling) (Default value = [1])
cyclic (bool) – If “True” then matching is done cyclically (Default value = False)
- Returns
Delta_min (np.ndarray) – Matching function (obtained by from minimizing over several matching functions)
Delta_N (np.ndarray) – Query length of best match for each time position
Delta_scale (np.ndarray) – Set of matching functions (for each of the scaled versions of the query)
- libfmp.c7.c7s2_audio_matching.mininma_from_matching_function(Delta, rho=2, tau=0.2, num=None)[source]¶
Derives local minima positions of matching function in an iterative fashion
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
Delta (np.ndarray) – Matching function
rho (int) – Parameter to exclude neighborhood of a matching position for subsequent matches (Default value = 2)
tau (float) – Threshold for maximum Delta value allowed for matches (Default value = 0.2)
num (int) – Maximum number of matches (Default value = None)
- Returns
pos (np.ndarray) – Array of local minima
- libfmp.c7.c7s2_audio_matching.plot_matches(ax, matches, Delta, Fs=1, alpha=0.2, color='r', s_marker='o', t_marker='')[source]¶
Plots matches into existing axis
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
ax – Axis
matches – Array of matches (start, end)
Delta – Matching function
Fs – Feature rate (Default value = 1)
alpha – Transparency pramaeter for match visualization (Default value = 0.2)
color – Color used to indicated matches (Default value = ‘r’)
s_marker – Marker used to indicate start of matches (Default value = ‘o’)
t_marker – Marker used to indicate end of matches (Default value = ‘’)
- libfmp.c7.c7s2_audio_matching.quantize_matrix(C, quant_fct=None)[source]¶
Quantize matrix values in a logarithmic manner (as done for CENS features)
Notebook: C7/C7S2_CENS.ipynb
- Parameters
C (np.ndarray) – Input matrix
quant_fct (list) – List specifying the quantization function (Default value = None)
- Returns
C_quant (np.ndarray) – Output matrix
- libfmp.c7.c7s2_audio_matching.scale_tempo_sequence(X, factor=1)[source]¶
Scales a sequence (given as feature matrix) along time (second dimension)
Notebook: C7/C7S2_DiagonalMatching.ipynb
- Parameters
X (np.ndarray) – Feature sequences (given as K x N matrix)
factor (float) – Scaling factor (resulting in length “round(factor * N)””) (Default value = 1)
- Returns
X_new (np.ndarray) – Scaled feature sequence
N_new (int) – Length of scaled feature sequence
- libfmp.c7.c7s3_version_id.compute_accumulated_score_matrix_common_subsequence(S)[source]¶
Given the score matrix, compute the accumulated score matrix for common subsequence matching with step sizes {(1, 0), (0, 1), (1, 1)}
Notebook: C7/C7S3_CommonSubsequence.ipynb
- Parameters
S (np.ndarray) – Score matrix
- Returns
D (np.ndarray) – Accumulated score matrix
- libfmp.c7.c7s3_version_id.compute_optimal_path_common_subsequence(D, cellmax=True, n=0, m=0)[source]¶
Given an accumulated score matrix, compute the score-maximizing path for common subsequence matching with step sizes {(1, 0), (0, 1), (1, 1)}
Notebook: C7/C7S3_CommonSubsequence.ipynb
- Parameters
D (np.ndarray) – Accumulated score matrix
cellmax (bool) – If “True”, score-maximizing cell will be computed (Default value = True)
n (int) – Index (first axis) of cell for backtracking start; only used when cellmax=False (Default value = 0)
m (int) – Index (second axis) of cell for backtracking start; only used when cellmax=False (Default value = 0)
- Returns
P (np.ndarray) – Score-maximizing path (array of index pairs)
- libfmp.c7.c7s3_version_id.compute_partial_matching(S)[source]¶
Given the score matrix, compute the accumulated score matrix for partial matching
Notebook: C7/C7S3_CommonSubsequence.ipynb
- Parameters
S (np.ndarray) – Score matrix
- Returns
D (np.ndarray) – Accumulated score matrix
P (np.ndarray) – Partial match (array of index pairs)
- libfmp.c7.c7s3_version_id.compute_prf_metrics(I, score, I_Q)[source]¶
Compute precision, recall, F-measures and other evaluation metrics for document-level retrieval
Notebook: C7/C7S3_Evaluation.ipynb
- Parameters
I (np.ndarray) – Array of items
score (np.ndarray) – Array containing the score values of the times
I_Q (np.ndarray) – Array of relevant (positive) items
- Returns
P_Q (float) – Precision
R_Q (float) – Recall
F_Q (float) – F-measures sorted by rank
BEP (float) – Break-even point
F_max (float) – Maximal F-measure
P_average (float) – Mean average
X_Q (np.ndarray) – Relevance function
rank (np.ndarray) – Array of rank values
I_sorted (np.ndarray) – Array of items sorted by rank
rank_sorted (np.ndarray) – Array of rank values sorted by rank
- libfmp.c7.c7s3_version_id.compute_sm_from_wav(x1, x2, Fs, N=4410, H=2205, ell=21, d=5, L_smooth=12, tempo_rel_set=np.array([0.66, 0.81, 1, 1.22, 1.5]), shift_set=np.array([0]), strategy='relative', scale=True, thresh=0.15, penalty=- 2.0, binarize=False)[source]¶
Compute a similarity matrix (SM)
Notebook: C7/C7S3_VersionIdentification.ipynb
- Parameters
x1 (np.ndarray) – First signal
x2 (np.ndarray) – Second signal
Fs (scalar) – Sampling rate of WAV files
N (int) – Window size for computing STFT-based chroma features (Default value = 4410)
H (int) – Hop size for computing STFT-based chroma features (Default value = 2205)
ell (int) – Smoothing length for computing CENS features (Default value = 21)
d (int) – Downsampling factor for computing CENS features (Default value = 5)
L_smooth (int) – Length of filter for enhancing SM (Default value = 12)
tempo_rel_set (np.ndarray) – Set of relative tempo values for enhancing SM (Default value = np.array([0.66, 0.81, 1, 1.22, 1.5]))
shift_set (np.ndarray) – Set of shift indices for enhancing SM (Default value = np.array([0]))
strategy (str) – Thresholding strategy for thresholding SM (‘absolute’, ‘relative’, ‘local’) (Default value = ‘relative’)
scale (bool) – If scale=True, then scaling of positive values to range [0,1] for thresholding SM (Default value = True)
thresh (float) – Treshold (meaning depends on strategy) (Default value = 0.15)
penalty (float) – Set values below treshold to value specified (Default value = -2.0)
binarize (bool) – Binarizes final matrix (positive: 1; otherwise: 0) (Default value = False)
- Returns
X (np.ndarray) – CENS feature sequence for first signal
Y (np.ndarray) – CENS feature sequence for second signal
Fs_feature (scalar) – Feature rate
S_thresh (np.ndarray) – Similarity matrix
I (np.ndarray) – Index matrix
- libfmp.c7.c7s3_version_id.get_induced_segments(P)[source]¶
Given a path, compute the induces segments
Notebook: C7/C7S3_CommonSubsequence.ipynb
- Parameters
P (np.ndarray) – Path (list of index pairs)
- Returns
seg_X (np.ndarray) – Induced segment of first sequence
seg_Y (np.ndarray) – Induced segment of second sequence