Queries without accents

I am an English speaker and, like many others, am terrible with handling accented letters. I can’t even remember the keyboard shortcuts for ones which come up most frequently, like e-acute or o-umlaut. Let alone how to type Dvořák! Also, the reality is that some of the data in my beets database is wrong in its use of accents so sometimes even typing the correct accents doesn’t work!

So, I would like to be able to do queries which can handle a range of accent mismatches.

I looked into this, and discovered that there is a complex Unicode standard for how to do this, called Asymmetric Matching. However, there is a much simpler approach, which seems to handle many common cases, which involves translating Unicode characters into simple, unaccented ASCII and then comparing those.

Thanks to the work of the beets team, that is very simple to do in a beets plugin. So I have created a small plugin for bare-ASCII matching (called bareasc). It seems to work for me, for my most common use cases (and allows me to type ‘dvorak’!). But I would welcome feedback on it. Is it useful for anyone else? What are the main limitations? Is performance an issue in real life? Should this be shipped with beets?

The plugin is available as pull-request #3883 for beets. Or you can download the plugin file itself from beets/bareasc.py at bareasc · GrahamCobb/beets · GitHub .

If you add bareasc to the plugin list in your config.yaml you can then enter bare-ASCII searches at the beats command line (or the beets web interface) by prefixing them with #. For example:

beet ls '#dvorak'

Note: # is a special character in most shells, which is why it is in quotes in the command line above.

  • Is this useful?
  • What limitations do you hit?
  • Is this worth making part of beets?

I am particularly interested in the limitations and whether they outweigh the benefits. I assume, for example, that it is useless for non-Latin-alphabets, but does it cause any problems in those cases?

By the way, I should make it clear that the hard work in this is all done in Sean Burke’s Unidecode library! Many thanks are due to Sean. All I have done is link that into the beets extensible queries infrastructure.

The plugin has been updated to add a bareasc command. This works exactly the same as the normal beets list command but applies the bare-ASCII transformation to the output so you can see what bare-ASCII means with your own data.

Note: the bareasc command does not automatically use a bare-ASCII query: you still need to specify the # prefix if you want to use a bare-ASCII query.

This plugin has now been accepted into beets and is available in the Github version of beets. I am still interested in comments. Feel free to mention them here, or raise an issue in Github.