sorry for the delay in my response, but it’s because your last
comments made me appreciate much better how beets’ matching occurs. in particular, all my chatter above in this thread merging chroma
attributes with musicbrainz attributes, were bogus: all are marshalled and ready: the details of matching seem to all lead around the construction and use of the autotag.hooks.Distance
object. and before going any further, i want to thank you again for your patient hand-holding as i come to better appreciate all that is in beets.
i wrote a little Distance.pprint()
method (below) to enumerate all
the features that go into the distance calculation. using the same
ec7e7dde-55ff-33cc-b2df-20276f7371eb
(i’ve calling it ec7e7
)
target as above, using the default weightings i get this:
album 0.0327272727273
artist 0.0914085914086
mediums 0.0
year 0.012539184953
* dist.tracks
0 0.0491803278689
1 0.0353535353535
2 0.125541125541
3 0.0374331550802
4 0.0522243713733
5 0.0369799691834
6 0.0376175548589
* tracks
track 1 2279052b-27b9-48bd-bc3b-56f9bb9ec6f8 1 Ein deutsches Requiem, op. 45: I. Selig sind, die da Leid tragen
track_id [0.0]
track_index [0.0]
track_length [0.0]
track_title [0.12962962962962962]
...
track 7 b29dc974-8a28-45d6-9cf3-ef239180252f 7 Ein deutsches Requiem, op. 45: VII. Selig sind die Toten, die in dem Herrn sterben
track_id [0.0]
track_index [0.0]
track_length [0.0]
track_title [0.4603174603174603]
...
choice: /Data/tmp/music-minTest/Barbara Hendricks, Jose Van Dam; Herbert Von Karajan_ Vienna Philharmonic Orchestra, Vienna Singverein/Brahms_ Ein Deutsches Requiem action.SKIP Brahms: Ein Deutsches Requiem Barbara Hendricks, Jose Van Dam; Herbert Von Karajan: Vienna Philharmonic Orchestra, Vienna Singverein 0.170705052658 ec7e7dde-55ff-33cc-b2df-20276f7371eb "album:[0.24]; tracks:[0.049180327868852465, 0.03535353535353535, 0.12554112554112554, 0.03743315508021391, 0.05222437137330755, 0.03697996918335902, 0.03761755485893417]; mediums:[0.0]; year:[0.27586206896551724]; artist:[0.6703296703296703]"
in the case of my target album, i can make it match with a score above strong_rec_thresh
by reducing the match.distance_weights.track_title
weight:
album 0.0327272727273
artist 0.0914085914086
mediums 0.0
year 0.012539184953
* dist.tracks
0 0.0128805620609
1 0.00925925925926
2 0.0328798185941
3 0.00980392156863
4 0.0136778115502
5 0.00968523002421
6 0.00985221674877
* tracks
track 1 2279052b-27b9-48bd-bc3b-56f9bb9ec6f8 1 Ein deutsches Requiem, op. 45: I. Selig sind, die da Leid tragen
track_id [0.0]
track_index [0.0]
track_length [0.0]
track_title [0.12962962962962962]
...
track 7 b29dc974-8a28-45d6-9cf3-ef239180252f 7 Ein deutsches Requiem, op. 45: VII. Selig sind die Toten, die in dem Herrn sterben
track_id [0.0]
track_index [0.0]
track_length [0.0]
track_title [0.4603174603174603]
...
dist: Barbara Hendricks, Jose Van Dam; Herbert Von Karajan: Vienna Philharmonic Orchestra, Vienna Singverein - Brahms: Ein Deutsches Requiem / ec7e7dde-55ff-33cc-b2df-20276f7371eb: 0.15 "{'album': [0.24], 'tracks': [0.01288056206088993, 0.009259259259259259, 0.032879818594104306, 0.009803921568627453, 0.013677811550151976, 0.009685230024213076, 0.009852216748768473], 'mediums': [0.0], 'year': [0.27586206896551724], 'artist': [0.6703296703296703]}"
Success. Distance: 0.15
choice: /Data/tmp/music-minTest/Barbara Hendricks, Jose Van Dam; Herbert Von Karajan_ Vienna Philharmonic Orchestra, Vienna Singverein/Brahms_ Ein Deutsches Requiem action.APPLY Brahms: Ein Deutsches Requiem Barbara Hendricks, Jose Van Dam; Herbert Von Karajan: Vienna Philharmonic Orchestra, Vienna Singverein 0.15 ec7e7dde-55ff-33cc-b2df-20276f7371eb 7 7 0 0
to my mind, if i have an album with all tracks having identified
track_id
(acoustic fingerprint ID’s) attributes, that should trump
(you should excuse the expression:) minor mismatches in things like the track_title
; do you agree, or can there be mitigating issues?
further, (and i suggest this with all humility) there seems to me a
general bug in beets’ DISTANCE logic with respect to
EXACT logical matches like identical track_id
: there seems no way
to up-weight this feature because it will have distance=0 on matches!? is there another mechanism whereby plugins could/should manipulate the distance calculation in logical (vs. weighted sums of distances) situations like this?
def pprint(self,lbl):
print("** "+lbl)
for k in sorted(self._penalties.keys()):
if k=='tracks':
continue
print(k,self[k])
print('* dist.tracks')
for ti,tp in enumerate(self._penalties['tracks']):
print(ti,tp)
print('* tracks')
sortTracks = sorted(self.tracks.keys(),key=lambda ti: ti.index)
for ti in sortTracks:
tinfo = self.tracks[ti]
tid = ti.track_id
print("track",ti.index,tid)
for tk in sorted(tinfo._penalties):
print(tk,tinfo._penalties[tk])