Disambiguation issues (with Weezer, specifically)

I am importing my Weezer CDs, and I’m seeing weirdness with the disambiguation of the various self-titled CDs.

I have customized my disambiguation to the following:

paths:
    default: $albumartist/$album%aunique{albumartist album albumdisambig,albumtype year albumdisambig country label}%if{$albumdisambig, [%title{$albumdisambig}]}/$disc_and_track $title
    singleton: $albumartist/[non album tracks]/$title
    comp: Various Artists/$album%aunique{albumartist album albumdisambig,albumtype year albumdisambig country label}%if{$albumdisambig, [%title{$albumdisambig}]}/$disc_and_track $artist - $title

As you can see, I always add disambiguation to my albums if present to get [Deluxe Edition] or whatever automatically, and it’s been working great for a while. The Weezer albums are strange, though, and I wonder how I can fix them.

I use the Chromaprint/Musicbrainz plugin for getting metadata which means that the MB disambiguation comment is used for $albumdisambig is that’s usually fine. The problem here is that the Release Group in Musicbrainz has a disambiguation comment of, say, “Blue Album”… and each indiviual release has a disambiguation comment of “Blue Album” as well in addition to anything else like “Deluxe Edition”. That makes the following happen when using $albumdisambig:

Weezer [Blue Album, Blue Album, Deluxe Edition]
Weezer [Green Album, Green Album]
Weezer [Red Album, Red Album, Deluxe Edition]
Weezer [White Album, White Album]

Is there any way non-manual way to fix this such that if I removed and reimported I would get [Blue Album, Deluxe Edition] or [Green Album]? I know I could manually fix this once, but I’d rather make this more automated, especially for potential cases like this in the future. I’m even up to using a plugin like the inline plugin to do text manipulation to remove duplicate strings, but my python is weak. Is $albumdisambig even available to that plugin since it’s not an actual album property?

Also - the duplicates plugin sees them all as duplicates. Is there any way to get the duplicates plugin to use the albumdisambig as a hint that it’s not the same album and import it without asking? I do still want it to ask if disambiguation fails, and I’d rather have it always ask than make a mistake, so if not that’s a small thing.

Interesting! That’s a perplexing problem. How would you suggest that this should work? Are you imagining a special case for when the release-group string is contained within the release string? Or something more general than that?

You can always use a plugin or inline field to manipulate data contained in existing fields, $albumdisambig included.

I’m not sure about the duplicates plugin; I thought that used IDs.

I’m not sure what the right fix is in the general case… I’ve been pondering that for a bit now. In this exact case, yes - removing the Release Group string since it’s exactly duplicated in the actual release comment would be the fix. I’m not sure that works all the time, though.

My admittedly weak Python fu, plus Google and Stack Exchange, have come up with this

album_fields:
  unique_disambig: u', '.join(set([x.strip() for x in albumdisambig.split(',')]))

I’m certain this can be improved, but it works in this specific case. Just replace $albumdisambig with $unique_disambig when the processed string is needed. In my case, I only use it in the final output, changing [%title{$albumdisambig}] to [%title{$unique_disambig}] in my path template.

Very nice! That’s a creative solution. FWIW, I might recommend a sorted() call inside the join() argument to make sure the values show up in some stable order.