"beet modify" takes too long

galicarnax · March 10, 2020, 6:09pm

I’ve imported my music library into beets, it’s about ~50 thousand tracks. Now I create custom fields in some items. Say, beet modify some nice track foo:bar, which creates the foo field for the track and assigns bar to it. This works instantaneously. But then when I query the DB with beet ls foo:bar it takes about 5 seconds to output the track. Even worse, if I want to remove all foo fields from all tracks that have them, I use beet modify foo!, and it takes about a minute, even if there only a few tracks that have foo. Is this an expected behavior for beet, or perhaps some specific bug in my case? If it’s the former, I wonder what is the performance bottleneck? I’ve worked with SQLite in C++, and it’s so speedy, especially with rather simple database structures as the one beets uses (only a few tables). Is this the python’s fault, then? (UPD: using beet v. 1.4.7 on Ubuntu 19.10).

adrian · March 10, 2020, 11:33pm

Indeed, this is currently pretty slow in beets. The reason is that flexible attributes (“custom fields”) above are implemented in a “schemaless” way that prohibits fast querying in general. You can read about what we call “slow queries” internally in this old blog post:
http://beets.io/blog/flexattr.html

To summarize, the current solution works by fetching everything from the database and then filtering it Python-side. (This does not happen for built-in fields.) It would definitely be possible to do better, but it would be a bit of a hack. It’s not possible to make all complex queries (imagine ones with lots of ands, ors, regex queries, etc.) fast in general, I think… or at least I don’t know how to generate the SQL to do that. But it’s not too hard to imagine what the SQL would look like for simple queries, like foo:bar alone. So we could consider hacking something up to make queries that just do that into “fast queries” with some special-purpose logic.

If you’re interested, please feel free to dive into the source for the beets.dbcore layer and let us know if you have any questions!

galicarnax · March 11, 2020, 3:19am

Thanks for clarifying. So the reason is that it’s difficult to translate an arbitrary user query into SQL query, if I understood correctly. I’m not sure, though, that I still understand exactly why. If you can form arbitrarily complex SQL queries involving only built-in fields, I see no reason why introducing flexattrs into the query should make a big difference. It would just introduce a JOIN, but otherwise it would use the same parsing logic to convert a user query into SQL. Well, maybe I’m still missing something. I’m not as well acquainted with python as I am with C++, but I might look into the source.

Just not to create another topic, I have another small question: Is it possible to query beet DB on track level, but show results on album level? If I use beet ls -a, then I see albums, but the queries involving track fields are ignored. I’d like to see list of albums in which at least one track matched a query.

adrian · March 11, 2020, 11:43am

That’s not currently possible, but it sounds interesting! Some way to say “at least one track matches this track-level query” could be cool. If you get a chance, can you perhaps consider filing a feature request?

galicarnax · March 15, 2020, 7:12am

Filed a feature request: https://github.com/beetbox/beets/issues/3515

Topic		Replies	Views
Call for testers: better queries for album-level fields & a performance improvement Development	6	906	April 28, 2019
How to use Custom Fields? Help	13	3704	October 28, 2017
Query album flexible attribute and get a list of tracks? Help	7	524	March 20, 2021
How to access flex attributes outside of Beets? Help	2	808	September 26, 2017
Using bitrate_mode in beet list Help	6	423	July 6, 2022

"beet modify" takes too long

Related topics