Where Does the Time Go? Importing a Large Music Collection

I am in the throes of importing a large set of tunes for a local radio station. The music collection is approximately 550,000 tracks organized as Artist-Album-Track going back many years (that is to say, there are archeological layers of disarray in the tagging).

What I’m seeing (or think I’m seeing) is that the overall speed at which Beets imports an artist’s collection has slowed and sometimes takes a lot longer than I think it would take to do manually.

The source and destination volumes are on network volumes hosted by a dedicated local NAS box on a pretty mediocre gigabit network (sadly, but this is radio, after all). The database resides locally on a MacPro with internal SSD and a goodly amount of RAM.

I’m currently importing using a bash script that loops through a list of 100 artists as a data source per session. It executes the import and gets a new artist until it completes the list. The import is run in quiet mode (-q flag) and employs both MusicBrainz and Discogs in the process.

My question then is does anyone have an insight into what is going on behind the scenes and how to I might optimize the process in order to speed things along? As an example a list of 100 artists is taking about 3 days to complete but I think I was able to do 400 or 500 in the past in about 6-8 hours.

Thinking aloud, should I be tending to the local database and perform some kind of maintenance or indexing between sessions? The music database is currently at 168 MB.

I appreciate any feedback. Thank you for your assistance.

Interesting question! One high-level question: does your shell script import a single album per beet import invocation, or does it get a whole artist directory? If the former, it’s missing out on the pipeline parallelism in the importer that can really help.

It’s not exactly easy to figure this kind of performance bottleneck out, but there certainly are somethings you can try. The first steps I’d take would be:

  • Try importing a single album all by itself, in a single beet import command. Does that take about as long as you’d expect, given the average rate in your background shell-script setup?
  • If so, try using verbose mode (-vv) to see what’s happening. There will be lots of logging for this one album, and it might be instructive to see where things hang up.

It’s not terribly uncommon for network filesystems to make beets real slow. It needs to do plenty of random access to the files to read their metadata, and depending on the protocol, that can be somewhat inefficient. So without more information, my first assumption would be that that’s what’s going on. It could be worth a shot installing beets directly on your NAS to see if that helps at all.

Hi! I have a smaller collection (50 000 tracks) and the database file is 350 MB.
Very big albums can take a stupidly long time, because if you have 3 releases all called “Complete Works” with each 1500 tracks, beets is gonna run for days before it finds out which one is the right one, in these cases you need to tell it the correct mbid (using the -S option if I remember correctly).
If your tag history is a mess, then I admire your courage and boldness to just let beets sort it out in quiet mode, I tend to do it using Picard (as it has a gui, then I use beet import -A) or at least check what it’s importing. The MusicBrainz database is far from complete and if you have such a big collection, there is a very big chance that some of your albums are not gonna be in MusicBrainz, and importing without checking could mess your metadata up (unless you don’t write it to the files).

In my experience, the best workflow is:
Tag your files using Picard (the official MusicBrainz tagger)
Import your files into beets using the -A option
In my experience, this works best because Picard has a GUI that allows you to check coverarts and tags on individual files, but Picard doesn’t handle big libraries well, so feed it in smaller batches. On the other hand, beets is best for organizing large libraries because it’s command-line only and it allows great flexibility for maintaining the database (finding duplicates for example) once they have been well tagged.
This is using MusicBrainz data only, as I believe the way MusicBrainz organises the data is superior to the way Discogs does it. On the other hand, there is a script that allows to easily import Discogs releases into MusicBrainz.