Date queries malfunction?

I’m using beets 1.4.9 on Ubuntu Studio 14.04.5 (Python 2.7.6) and try to select titles that were released this month. beet info shows they have original_date: 2019-09-13 and date: 2019-09-13.

All the following queries return no results:

beet ls original_date:2019-09
beet ls 'original_date:2019-09-01..2019-09-30'
beet ls original_date::'^2019-09.*'

I was expecting all of these to return all titles that were released in September, 2019.

Further testing shows that even these turn up no results:

beet ls original_date:2019-09-13
beet ls date:2019-09-13

The only way to come up with some results seems the following, but that’s of course totally awkward for more complex queries:

beet ls original_year:2019 original_month:9
beet ls year:2019 month:09 day:13

My mistake (why?) or a bug?

This is confusing, but here’s the thing: date and original_date are not actually beets fields. They are MediaFile fields—i.e., things that show up on the actual, physical metadata tags on files. (But even there, they’re just pseudo-fields that reflect the three underlying components that are stored in tags.) So they show up in info, which (unless you use the flag to inspect library data) tells you about on-disk file metadata. But because they’re not beets database fields, you can’t query them.

Hmm. That is extremely unfortunate, and quite confusing for a user (she wouldn’t know that a field shown in info or a display format is not in the DB and thus not queryable).

So if I understand correctly, there is an abstraction layer for tag data but not for database content? The docs show a lot of nice example queries for ‘added’ and ‘mtime’, I think these should absolutely also work for release and original dates.

You say you store a ‘date’ as separate y/m/d columns in the DB. I wonder if that originally stems from the idea to be able to represent incomplete dates (which SQLite apparently can’t handle in date columns). The real tags in the files are always one tag only, of the form YYYY[-MM[-DD]], so MediaFile must actually deconstruct them into separate fields for the DB.

For one of the files in question, I can see beet info shows only original_year and no original_month or original_day, resulting in original_date: 2019-01-01.

It’s nice that one can perform date queries against ‘added’ and ‘mtime’, but from a user’s perspective, it makes much more sense to be able to search for release dates and/or original dates.

I wonder how complicated it would be to add that, without too much performance penalty.

Or—possibly—let MediaFile construct one ‘date’ field that allowed zeroes for non-existant date parts and store it in the DB as a 10-char text field. The only problem then would be date calculations, because these couldn’t be done without either the DB handling it (like MySQL) or a Python class wrapper.

Maybe something like Django’s PartialDate might help. It is of course also a kludge, insofar as it stores into a DateTime field and uses the seconds as a precision indicator. Oh well, but it works and doesn’t break the DB or datetime calculations.

ID3v2 frames like TDRC, TDRL and TDOR allow exactly the same content (YYYY, YYYY-MM, YYYY-MM-DD, and even some more), so this might be a viable solution to read these (possibly incomplete) tags and even write them back correctly. Same applies of course for the date fields in other file formats.

If stored in the DB as either DateTime or even VARCHAR(10), it would also be easy doing date queries and comparisons. The gut feeling says VARCHAR(10) (sortable, human readable) but from a performance standpoint I’d probably go for DateTime fields (less conversions to/from Python, and DateTime as a text field in SQLite could still be used).

As opposed to storing 3 different columns per date in the DB, I think this would make easy queries possible and performant, and still allow easy resolving to year/month/day variables, if needed.

Just thinking out loud …

Yep, that’s MediaFile. The only place this abstraction layer shows up in the beets UI is in the info plugin, which is meant to act like a replacement for CLI tools like id3info. In its default configuration, beet info doesn’t actually use any of the beets infrastructure—it reads tag data directly using MediaFile.

Indeed:

I don’t think it would be too hard to implement the pseudo-field as outlined in #2872 on top of the existing database schema. It certainly wouldn’t be hard to do it in an inefficient way, which would be a good start, and then we could come back around and make it fast.

I tend to prefer keeping things like dates in one “unit”, be it database field or a Python datetime object. There are so many well thought-out things in these classes, like date validation (would be hard using three DB fields), comparison and range calculations. Even in the DB layer, like with complicated JOINs or such.

Would it be great problem to “break” the existing DB structure and revert to datetime fields maybe, instead of breaking dates apart and storing the parts separately? I have a strong feeling this will turn out to be the better way.

As I said, the real tags in audio file metadata are also always “date” type fields, they are never broken into separate values. (Well, apart from TDAT/TYER and TORY in ID3v2.3 which has been remedied in ID3v2.4.)

A few quick points:

  • I think it’s possible to have your cake and eat it too with a logical date field. Doing in this in Python—exposing a datetime, even if it’s backed by three different fields physically in the database—seems to have no real downsides, right? You can still do validation, comparisons, range calculations, etc.
  • It’s worth thinking carefully about how that possibly-partial date would be stored. Here is the list of all the SQLite data types. I think the only reasonable storage type would be a string, which would make range queries difficult.
  • Yes, in general, it is a problem to change the database schema like this. :smiley: There are lots of large beets databases out there that we need to remain backwards-compatible with. A migration is possible but would be complex to implement.

Wouldn’t something like this work (quick check hacking along in sqlite3)?

CREATE TABLE test (date1 TEXT);
INSERT INTO test VALUES (datetime('now'));
INSERT INTO test VALUES (datetime('now', '-1 year'));

SELECT * FROM test;
2019-10-01 13:34:19
2018-10-01 13:34:19

SELECT * FROM test WHERE date1 BETWEEN datetime("2019-01-01") AND datetime("2019-12-31"); 
2019-10-01 13:34:19

SELECT strftime('%Y', date1) FROM test;
2019
2018

Using one text field for the date in SQLite, this all works easily, even when using something like a PartialDate("2019-10") Python object (it would store 2019-10-01 00:00:01 in the field).

I wonder how you’re currently doing ranges with 3 separate columns (any pointer?).

To be honest, I’m unsure how I would do a queryable logical date field.

That’s an interesting proposal, but it seems to have two downsides:

  • It’s not possible to distinguish between complete and partial dates. That is, if we convert PartialDate("2019-10") to the string 2019-10-01 00:00:01 to store it, how would we know to “recover” that value instead of PartialDate("2019-10-01")?
  • Matching and range queries now require manipulating strings instead of just comparing integers.

Here’s how to do queries with the three-column format we currently have:

  • date:2019-10: Generate the SQL query WHERE year=2019 AND month=10.
  • date:2019-10..: Generate the SQL query WHERE (year=2019 AND month>=10) OR (year>2019).

And so on for more complex range queries. We’re not currently doing any of this, but it seems like it would be “fun” to implement!

The Django guys use a “flagging” system which I’ve also seen used elsewhere. It’s not elegant but probably has the least downsides. What’s used is actually only the “date” part of a “datetime” value (they could use down to minutes). The “seconds” serve as a flag how “partial” the date is: 0=year, 1=year+month, 2=year/month/day (if I remember the values correctly).

So it’s still a “valid” and comparable “datetime” (even compatible with Python’s datetime) but “PartialDate” knows to resolve back by using the seconds flag. Not elegant, but creative. And most compatible with existing databases and programming languages, however they might store their dates. A convention, so to say. At least more compatible than mySQL’s “zero” y/m/d (even if that’s easier to read for humans).

See https://github.com/ktowen/django_partial_date/blob/master/partial_date/fields.py

A good thing with this strategy is they could be mapped 1:1 into what’s currently stored in ID3v2, Vorbis and APE date tags, i.e. YYYY, YYYY-MM, or YYYY-MM-DD. Would be a snap to use in MediaFile, and allow “real” datetimes be used in Python and the DB.