Newbie: How to deal with duplicates within an album?


I’m absolutely new to beets and I’m also absolutely no “console virtuoso”.

The file system of my music collection is organized in folders for artist and subfolders for album - seems to be the most common way. After several re-installations of itunes in the past, some album subfolders contain just one title (still ok, it may have been taken from an compilation and landed on a different place in the artist-album structure), but this title in several duplicates with slightly different filenames, like 01_01_title and 01_title and 01 title and so on…

When importing, beets first wants to recognize the album, and when I confirm that, it tries to match the duplicates with other titles from this album. So I end up with an album with 5 different titles, but all contain the same music.
When I choose “import as tracks” (before or after confirming the album name), I end up with just one folder for the artist, but no album subfolders or rather one “no_album” folder.

Is there a way to deal with the duplicates within an album?

Remember, I’m a newbie. My configuration file is still very “slim”, my knowledge and experience with beets is short. So don’t hesitate to recommend very basic things.

Thanks in advance.

Good question. I don’t think beets has a great way to deal with that specific situation right now. The importer will try to match all the files in an “album” against the incoming metadata as well as it can muster. This proposal has the blueprint for how we might address this, but it’s easier said than done:

If there’s a pattern to the tracks you can ignore, you could consider matching those with your ignore configuration…

Thanks for the quick answer.

I read the link.
Still far far away from having knowledge about programming and python,but I might be able to understand the selection and matching process…

So at first, beets tries to fill the album.
If my album has less tracks than musicbrainz says, it tries hard to match these tracks and NEVER drops one. Once it has “matched” the tracks, it renames them and adjusts the ID3 tags. So they will not be recognized as duplicates later.

For now, I have the fromfilename plugin. Is it possible to give more weight to this parameter and would this help?
What about chroma? And giving weight to this?
Or does this all not help against the habit of matching ALL found tracks in an album?

Your proposal of finding a pattern to ignore might work… I will go through my data and see if all those duplicates I want to get rid of have a structure like two underscores in the filename. But I fear I will kill a lot non-duplicate with this… and how would I install such a ignore filter?

Another idea I could follow: Importing everything as tracks works in principle. In the link you gave me, I saw a config file with some path statements:

    default: /$albumartist/$album/$track $title
    singleton: /$albumartist/$album/$track $title
    comp: /Compilation/$album/$track $title
    albumtype:soundtrack: Soundtracks/$album/$track $title

plugins: discogs
directory: /volume1/homes/admin/fs_admin/Music_WIP/beets/

    move: yes
    write: yes
    log: beetslog.txt
    user_token: REDACTED
    tokenfile: discogs_token.json
    apikey: REDACTED
    apisecret: REDACTED
    source_weight: 0.5

    ignored: missing_tracks  unmatched_tracks
    strong_rec_thresh: 0.15
duplicate_action: skip

Would these path things solve the issue that everything lands in an “no_album” folder, and instead re-install the artist/album structure based on the information in the tags?

Indeed; none of that will address the core issue.

It’s worth a try, but keep in mind that beets won’t try to set the album or artist fields on non-album tracks. So you could try reorganizing your music like that in a first pass, running the duplicates plugins to remove duplicate tracks, and then doing a second pass where you import everything again as albums.

I have decided for a different system, many years ago: Do not keep compilations (i.e. “Various Artists” releases) together, but split them—songs found on compilations are stored in a special “[compilations]” folder beneath the artist folder, so I can find all titles I have from a single artist easily. (And any good software can easily reconstruct what was on a compilation release.)

That said, I do allow “duplicates” within the “[compilations]” subfolders, because the same song could have been on many compilations.

I do not allow duplicates within the same album (but I do sometimes keep different releases of an album, like the original and a later remaster). If it’s clear a song belongs to this album, then there’s no need to duplicate it (for instance, if I bought a single MP3 track on Amazon and later bought the CD release and ripped it as FLAC, the single MP3 belonging to this album gets deleted).

For me, this is the best method, because I like to keep everything together that I own from one artist, and it also easily allows to distinguish the files from compilations from those that actually belong to real albums.

Now I’m looking for a way to do this using beets … (I’ve been using MusicBrainz Picard before.)

I think posting my Picard file naming string wouldn’t be of much help, since beets might use different variable names?

Hello, it took some time, I was not able to play with my PC at home yesterday. And wrote my latest post in the office, by just remembering what I saw at home. Now I did some small investigations.

First: The result of importing “as Tracks” is a structure like a top-level folder called “Non-Album” and subfolders by artist, but these subfolders are not divided into albums. Every track from one artist lands in this single artist folder, no matter to which album it belongs.

Honestly, if we don’t find a solution, I could even live with that.

But I tried to import the result a second time. The result: Somehow beets recognized an album which was imported completely. But treats all the other tracks as despensable and skips them. I am quite sure I imported everything “as tracks” in the first run. The first-pass result also looks like that. So maybe beets recognizes this one album because all of the ten tracks were present in the folder it had to import. And the other five didn’t make enough disturbance to avoid a good “album match”
Anyway, I cannot live with the result of loosing many tracks.

@Moonbase59: Your recommendations are good, but I have already the situation of having unwanted duplicates. Mostly they did not come from importing the same track from different sources eg. original album, found on compilation A, found on compilation B and so on. No, it must somehow have happened while re-installing itunes and re-importing the whole library. A lot of album folders contain up to five duplicates of a track, named “01 title” (might be the original), then “01_01 title” and “01_01 title 1” and “01_01 title 2” and so on. And the same with “02 other title” and so on. For the human eye walking through the folders it’s easy to see and to fix by killing the duplicates, but it’s too much data to do that “by foot”. Although it’s by far not as much data as you have.

So for the moment, I don’t see a solution within beets. But I will ask someone who knows programming in python. It’s no use to work with simple patterns to ignore something. Those added numbers in the filename are just too short. But I think it should be feasible to balance all filenames within one folder, recognize the trackname that repeats in all duplicates as the actual pattern we’re looking for, then strip off those varying numbers around the trackname, get rid of the duplicates and keep just one item per trackname. This could run as standalone, before I import with beets.

What do you think about that?

Using Python (if you know it or have someone do it) is actually a great idea, because you can do almost anything easily. Apart from just checking for patterns in filenames, you could even inspect closer by using Mutagen (the Python tagging module) or calling up fpcalc to see if the files are (acoustically) dupes.

So yes, it might just be a good idea.