I’ve imported my music library into beets, it’s about ~50 thousand tracks. Now I create custom fields in some items. Say, beet modify some nice track foo:bar
, which creates the foo
field for the track and assigns bar
to it. This works instantaneously. But then when I query the DB with beet ls foo:bar
it takes about 5 seconds to output the track. Even worse, if I want to remove all foo
fields from all tracks that have them, I use beet modify foo!
, and it takes about a minute, even if there only a few tracks that have foo
. Is this an expected behavior for beet, or perhaps some specific bug in my case? If it’s the former, I wonder what is the performance bottleneck? I’ve worked with SQLite in C++, and it’s so speedy, especially with rather simple database structures as the one beets uses (only a few tables). Is this the python’s fault, then? (UPD: using beet v. 1.4.7 on Ubuntu 19.10).
Indeed, this is currently pretty slow in beets. The reason is that flexible attributes (“custom fields”) above are implemented in a “schemaless” way that prohibits fast querying in general. You can read about what we call “slow queries” internally in this old blog post:
http://beets.io/blog/flexattr.html
To summarize, the current solution works by fetching everything from the database and then filtering it Python-side. (This does not happen for built-in fields.) It would definitely be possible to do better, but it would be a bit of a hack. It’s not possible to make all complex queries (imagine ones with lots of ands, ors, regex queries, etc.) fast in general, I think… or at least I don’t know how to generate the SQL to do that. But it’s not too hard to imagine what the SQL would look like for simple queries, like foo:bar
alone. So we could consider hacking something up to make queries that just do that into “fast queries” with some special-purpose logic.
If you’re interested, please feel free to dive into the source for the beets.dbcore
layer and let us know if you have any questions!
Thanks for clarifying. So the reason is that it’s difficult to translate an arbitrary user query into SQL query, if I understood correctly. I’m not sure, though, that I still understand exactly why. If you can form arbitrarily complex SQL queries involving only built-in fields, I see no reason why introducing flexattrs into the query should make a big difference. It would just introduce a JOIN
, but otherwise it would use the same parsing logic to convert a user query into SQL. Well, maybe I’m still missing something. I’m not as well acquainted with python as I am with C++, but I might look into the source.
Just not to create another topic, I have another small question: Is it possible to query beet DB on track level, but show results on album level? If I use beet ls -a
, then I see albums, but the queries involving track fields are ignored. I’d like to see list of albums in which at least one track matched a query.
That’s not currently possible, but it sounds interesting! Some way to say “at least one track matches this track-level query” could be cool. If you get a chance, can you perhaps consider filing a feature request?
Filed a feature request: https://github.com/beetbox/beets/issues/3515