Any suggestions for improving import/update throughput?

Hi,

I am new to beets. I have been using Picard for many years and all my files are properly tagged, using MB IDs as much as possible (I’ve also been adding lots of albums to MB whenever not available). I was interested in some extra functionalities that beets can offer (advanced database queries, mass tagging and also thinking of using some plugins like the LastImport to bring my 15 years of listening history into my own database). So at this stage, I am not doing any tagging in beets, but wanted to import my library accessible over a network share.

However, my library is fairly large (over 350K tracks) and I have found that the beets import was rather slow. It started well in the beginning but as the DB size grew I started getting more than 5 seconds in between albums, which would have meant about 2 days for the full import to complete.
After a bit of investigation I found that the import was CPU bound (it was using a full CPU core on the computer running the import) at it was actually spending time full scanning the ever growing ITEMS table which kept on slowing it down. I also found out that creating an index on PATHS would actually help a lot in my case, e.g.:

CREATE INDEX items_path_idx on items(path);

This helped increase the throughput of the import from 1 album every 5 seconds to about 3 albums per second on average, so the import would complete in less than 4 hours.

This is not bad for an initial import, but every time I rerun the import after adding more files to the library, it takes about the same time and an update takes about 6 hours…
So if I have to run update + import every time I update existing files and add new ones (about once a week) then it’s a total of 10 hours, which is a bit of a show stopper for me.

At this stage I am running out of ideas about how to make it faster, so any help would be appreciated. In comparison, when running an incremental scan on LMS, it completes in 10-20 minutes, depending on the amount of changes, which is perfectly acceptable to me. If I would get to no more than an hour in beets it would also be fine.

Thanks for any inputs, all help appreciated! :wink:

Can you post what commands you’re using? It sounds like you’re doing a full import every time you get a new album. That is, you’re importing 350k tracks, getting another 10, then importing 350k+10 tracks. If so, proper usage is like:

beet import /bigfolder/
# days later, big folder is done
beet import /tinynewalbum/
# no need to reimport existing albums, this import will be quick

If you have a new path format, use move: FAQ — beets 1.6.1 documentation

Still, Beets has been known to perform at a “meh” level in many areas for a while. Your problems wouldn’t terribly surprise me even if you’re doing everything right.

Yes, indeed, I am doing an import of the full folder, but not after each album but once every 1-2 weeks on average. In the meantime I have added anywhere between 50 and 100 new albums and also could have updated tags for some tens of albums (mainly adding them to MB because they are unknown). It’s impossible to keep track of each and every added and modified album and manually running 50 to 100 separate commands for adding them to beets is also not very practical.
In LMS I just use “look for new and changed music” and it figures out on its own what files it has to scan. I assumed an import command in beets would do the same, but apparently I was wrong… In beets I have to run update to find all modified albums then import to add all new ones… As mentioned, I am not using beets to tag and move/reorganize files, all I want is for them to be added to the database so I can run my custom reports and other plugins.

Oh. That sounds like you’re doing it right then.

Well actually it turns out I’ve been doing some things wrong after all… Apparently the paths are case sensitive in beets so when importing files from my network share, using beet import \server\share is not at all the same thing as \SERVER\share for example… When using different upper/lower case combinations all files will be imported again, so what I thought to be an incremental import turned out to be a full import all over again. After a few imports I ended up with same files imported 3 or 4 times in my database which had grown over 1 million items…

I have now restarted from scratch with a new database and will always use the same spelling for my network share, e.g. always use \server\share and nothing else.

I have also created an unique constraint in my database which will prevent the creation of duplicates:

CREATE UNIQUE INDEX enforce_unique_paths_idx on items(lower(cast(path as text)));

So if I first import using \server\share then it will force me to use the same every time, otherwise the import will just fail on unique key violations and this will prevent creating inconsistencies again.

Now if I run a full import again without adding any new albums, it will complete in just over an hour which is much better already. The update command still takes as long as before (around 6 hours), no improvement there, so I will probably have to plan that less frequently than the weekly import.