I am an English speaker and, like many others, am terrible with handling accented letters. I can’t even remember the keyboard shortcuts for ones which come up most frequently, like e-acute or o-umlaut. Let alone how to type Dvořák! Also, the reality is that some of the data in my beets database is wrong in its use of accents so sometimes even typing the correct accents doesn’t work!
So, I would like to be able to do queries which can handle a range of accent mismatches.
I looked into this, and discovered that there is a complex Unicode standard for how to do this, called Asymmetric Matching. However, there is a much simpler approach, which seems to handle many common cases, which involves translating Unicode characters into simple, unaccented ASCII and then comparing those.
Thanks to the work of the beets team, that is very simple to do in a beets plugin. So I have created a small plugin for bare-ASCII matching (called bareasc
). It seems to work for me, for my most common use cases (and allows me to type ‘dvorak’!). But I would welcome feedback on it. Is it useful for anyone else? What are the main limitations? Is performance an issue in real life? Should this be shipped with beets?
The plugin is available as pull-request #3883 for beets. Or you can download the plugin file itself from beets/bareasc.py at bareasc · GrahamCobb/beets · GitHub .
If you add bareasc
to the plugin list in your config.yaml you can then enter bare-ASCII searches at the beats command line (or the beets web interface) by prefixing them with #
. For example:
beet ls '#dvorak'
Note: # is a special character in most shells, which is why it is in quotes in the command line above.
- Is this useful?
- What limitations do you hit?
- Is this worth making part of beets?
I am particularly interested in the limitations and whether they outweigh the benefits. I assume, for example, that it is useless for non-Latin-alphabets, but does it cause any problems in those cases?