Hi,
I’m in the process of importing a large music collection into Beets, just under 200k tracks. I’m about a third in and I’d like get some feedback on the problems I’m encountering and possible solutions. The collection has already mostly been tagged with Picard, renamed and organized into a folder structure. So I’m using Beets in metadata only mode (copy: no, write:no, incremental: yes, but manually writing tags to files when a chunk is done), for the extra tagging provided by the Discogs and Beatport plugins, finding duplicates as well as other features and scripting down the road.
Most of the issues are related to reimporting and duplicate handling:
- Rescanning does not handle moved/renamed tracks, instead it finds a new duplicate (despite Beets erring on missing files).
- Rescanning an album that was manually tagged outside of Beets causes it to find a duplicate, and if one does select “remove old”, the single album on disk is deleted. Selecting “keep both” maintains a duplicate entry in the DB.
- Rescanning in incremental mode does not handle deleted tracks, dead entries remain in the DB. Instead of skipping on already visited folders in incremental mode, I believe the scan should go deeper before deciding to skip. Beets should still check the folder contents and file timestamps and proceed with import if they don’t match the DB.
- Rescanning in non-incremental mode requires you to reenter all manual choices even for albums that were already imported and tagged. This should only be necessary for new albums. Maybe provide a parameter to let users chose between maintaining previous choices or re-asking.
- Currently tracks tagged via Discogs or Beatport are not automatically re-detected during re-import, unlike MB.
- Beets should be able to detect multi-disc albums even when the folders are not labeled CD_ or Disc_. On the first disc it might complain about missing tracks, but as soon as the second disc is scanned, it should detect it as belonging to the same album instead of another incomplete duplicate. This could be limited to folders contained in the same parent folder.
- Albums without “album” tags that also cannot be autotagged are all detected as duplicates of album " - ". By now I have a long list of those. For these, the duplicate detection should be disabled, or at least switched to something else, like based on the fingerprint.
- Due to the issues with importing, there is no good workflow for dealing with albums falsely detected as duplicates, unless you manually tag them via Beets (not so fun, sorry).
- Finally there are cases where I’m unable to determine WHY an album or track is detected as duplicate.
Other issues
- Transliteration or rather aliasing of foreign characters (cyrilic, asian) isn’t working, while in Picard it does.
- Occasional import hangs, both random and reproducible.
- Feature Request: Setting distance threshold value for automatic “Use as is”
- “beet write” changes files on disk even when writing is pointless. For example: The bpm field is stored as float in the DB but as int in the tags. This is detected as a change during every Beets write, causing redundant disk writes, which can be inconvenient, for copy-on-write filesystems, for instance.
I’m even considering looking into some of those issues, but I’m no Python coder. I have some experience with PHP but any tips for setting up the dev environment would be welcome. A doc page for devs about the recommended IDE setup, testing procedures, required packages, other recommended tools etc. would be ideal.
Cheers