What does beets do with already existing .lrc files?

Alando2048 · September 30, 2024, 5:06am

Hi, I’ve got a bunch of albums (over 1000) that all have their own individual synced “songname.lrc” (exact name match to the .FLAC files except for extension) in the folders.

In the past with albums that don’t have their own files, I have noticed that sometimes slightly out-of-sync lyrics are imported.

In my config I have:
lyrics:
force: no
synced: yes
source: lrclib genius

I have 1 problem that I really want to clarify before I potentially mess up my library.
The .FLAC files have unsynced lyrics embedded, but the .lrc files contain the synced lyrics for music players to use.

Since the .lrc files are separate, does beets embed the contents of the .lrc files to the .FLAC file, or does it simply copy it to the new folder, or does it disregard the .lrc files and sync new lyrics from the web?

Once that is cleared up, would I need to change my lyrics configuration? I just want my synced lyrics kept, I really don’t want to lose them or have new ones from the internet imported unless the .lrc files are missing

jackwilsdon · September 30, 2024, 10:10am

beets won’t import files alongside your tracks (except maybe cover art?). There’s a couple of solutions in this thread for copying additional files into your library during import: Extrafiles vs. copyartifacts plugin or something else for lyrics?

Alando2048 · September 30, 2024, 8:33pm

Thanks for the quick response.

I ended up using a workaround that I came up with to recursively embed the lyrics into all .FLAC files’ metadata with ffmpeg, then delete all the .lrc files.

Once that’s done, I finally do “beet import”, now I can enjoy my properly organized music library with properly synced lyrics.

It’s pretty easy to do in Linux:

#!/bin/bash
shopt -s globstar

for file in **/*.flac; do
    base="${file%.*}"
    lrc_file="$base.lrc"

    if [[ -f "$lrc_file" ]]; then
        ffmpeg -i "$file" -map 0 -c copy -metadata LYRICS="$(cat "$lrc_file")" "${base}_with_lyrics.flac"
        rm "$lrc_file"
        mv "${base}_with_lyrics.flac" "$file"
        echo "Lyrics embedded into $file"
    else
        echo "No .lrc file found for $file"
    fi
done

Hope this helps someone out one day. Cheers!

Casual_Tea · September 30, 2024, 9:12pm

Since @jackwilsdon linked to my old thread I might as well add how I solved a good portion of my lyrics troubles.

While this approach might work if you only have flac files and if all your .lrc files match them, it lacks other formats and cases where the .lrc file does not match the song.

Initially I only wrote a script that checks for unmatched .lrc files recursively (so I could manually fix them). After helping another user (who had trouble with my initial, rough script), I started expanding it a bit. Then I kept adding stuff, finally resulting in lyrict which can check for unmatched lyrics files, import and export lyrics (synced and unsynced) from and to multiple audioformats (and tags) and standardize the timestamp format of the lyrics. From [0:00.000]TEXT to [00:00.000]TEXT for example.

Feel free to check it out and report any errors you find.

Alando2048 · September 30, 2024, 11:13pm

Hey! Thanks for your input. I’ll definitely give your work a look because it certainly seems to align with what I’m looking for.

Your right about that, my approach really only applies to someone with a somewhat “managed” library. I actually did adapt this to all of the other formats which I’ve used in my library and I’ll share it here:

#!/bin/bash
shopt -s globstar #nocaseglob (todo: ignore capitalization)

# Commonly used Extensions
extensions=("flac" "alac" "wav" "mp3" "m4a" "ogg" "opus")

for ext in ${extensions[@]}; do
    for file in **/*."$ext"; do
        base="${file%.*}"
        lrc_file="$base.lrc"
    
        if [[ -f "$lrc_file" ]]; then
            # Keep basename and match extension of original file
            temp_file="${base}_temp.${file##*.}"
            # Embed lyrics
            ffmpeg -i "$file" -map 0 -c copy -metadata LYRICS="$(cat "$lrc_file")" "$temp_file"
            # Cleanup
            rm "$lrc_file"
            mv "$temp_file" "$file"
            echo "Lyrics embedded into $file"
        else
            echo "No .lrc file found for $file"
        fi
    done
done

As you can see at the top of the code I was actually considering potentially mismatched (in terms of capitalization) .lrc files.
Although I know my current library and personally put it together, hence no need for nocaseglob I wanted to make it easier for future additions.

Basically it would evaluate filenames as common letters and then embed if there is match in the basename for the .lrc and .format files.

I have experience with Python, but much more with bash, so that’s my approach for the while. As I said at the beginning, I’ll definitely look into your work as well.

A smarter approach on my end as well would be to find the .lrc file and then look for a matching file regardless of extension in the same directory.

Casual_Tea · October 1, 2024, 12:56am

That was the sole purpose of my script in the beginning. I specified the formats it should check, it then found all .lrc files and looked for matching audio files of these formats. If it did not find any, it added the .lrc file path to a list of unmatched files to be fixed.

Later on that became creating multiple lists based on found/not found and which format the lyrics are associated with to create meaningful lists which are then further used for importing .lrc files efficiently (why try to embed a .lrc file for hundreds of thousands of songs if you already have a list of those that do have matching .lrc files?). I put quite a lot of thought into my script and tried to keep it as flexible and efficient as I could.

I’m always looking for feedback.

Alando2048 · October 1, 2024, 6:01am

Just thought I’d give an update on my solution, this does exactly what I need for synced lyrics specifically. It utilizes bash on Linux (my library is stored on a machine running Ubuntu Server).

I’ve optimized it to match lyrics to media instead of media to lyrics (find .lrc files first, then looks for media that matches it.)

#!/bin/bash
# Allow matching entire directory tree, case insensitive matching.
shopt -s globstar nocaseglob

# Loop through all directories
for lyric in **/*.lrc; do
    # Filepath without extension.
    base="${lyric%.*}"
    # Find all files with same basename
    for media in "$base".*; do
        # Ensure identified file is not the synced lyrics file.
        if [[ "$media" != "$lyric" ]]; then
            # Add "_temp" to filename for output. 
            temp_file="${media%.*}_temp.${media##*.}"
            # Embed lyrics to media file.
            ffmpeg -i "$media" -map 0 -c copy -metadata LYRICS="$(cat "$lyric")" "$temp_file"
            # Replace original file with updated file
            mv "$temp_file" "$media"
            # Remove .lrc file (it's embedded in the media, no longer needed)
            rm "$lyric"
            # Lets you know which file has synced lyrics now.
            echo "Lyrics embedded into $media"
        fi
    done
done

Alando2048 · October 5, 2024, 6:11am

I’ll keep updating here as I get closer to what I’m looking for. (Script needs ffmpeg and ffprobe, can easily be adapted for non-bash shells, will do later). I have beets setup in a virtual environment, so that can be removed if you have it install on your system and added to path.

I also have Plex Media Server set up (which unfortunately doesn’t support ID3 tags very well)… So for lyrics I’m dependant on the .lrc files.

Quick overview of what this script does:

Scans media files based on the list of extensions
Reads lyrics in the file.
If, Then, Else

If it’s synced, leave the file alone, and delete .lrc if it exists (my import and library folders are separate, which is why the 2nd part is done)
Otherwise if it’s unsynced, look for synced lyrics (.lrc file) and embed
If it’s unsynced and no .lrc file exists, strip the lyrics from the file so beets can autotag it with synced lyrics. Or leave it alone if it’s blank.

Beet Update/Import/Convert
Extracts the lyrics (.lrc and .txt) from all media files in the library (and exports a list of songs with unsynced lyrics to the library’s parent folder) for Plex Media Server to read

ROOT_DIR is just where the script is (the /home/$USER directory in my case)

#!/bin/bash
shopt -s globstar nocaseglob

# Get the directory of the script
ROOT_DIR="$(dirname "$(realpath "$0")")"
# State Import/Library folders
IMPORT_DIR="/home/${USER}/Downloads/Music"
LIBRARY_DIR="/home/${USER}/Music/Library"

# List of extensions of media files.
extensions=("flac" "m4a" "mp3")

### Embed Synced Lyrics in music to be imported
# Go to Import Directory
cd "$IMPORT_DIR"
find . -type f -name *.txt -delete

for ext in ${extensions[@]}; do
	if [[ "$ext" == "flac" ]]; then
		tags="LYRICS"
	else
		tags="lyrics"
	fi

	for file in **/*."$ext"; do
		# Get lyrics from metadata
		lyrics=$(ffprobe -loglevel error -show_entries format_tags="$tags" -of default=noprint_wrappers=1:nokey=1 "$file")
		# If not empty, check if it's synced and try to embed if not.
		if ! [ -z "$lyrics" ]; then
			# If synced, leave untouched
			if echo "$lyrics" | grep -qE '^\[[0-9]{2}:[0-9]{2}.[0-9]{2}\]'; then
				echo "Synced lyrics already embedded in '$file'."
				if [ -e "${file%.*}.lrc" ]; then rm "${file%.*}.lrc"; fi
			# If not synced, look for .lrc file and embed
			elif [ -e "${file%.*}.lrc" ]; then
				lyrics="${file%.*}.lrc"
				temp_file="${file%.*}_temp.${file##*.}"
				ffmpeg -i "$file" -map 0 -c copy -metadata "$tags"="$(cat "$lyrics")" "$temp_file"
				mv "$temp_file" "$file"
				rm "$lyrics"
			# Otherwise, strip for autotagging in beets
			else
				echo "Stripping lyrics from '$file' for autotagging."
				temp_file="${file%.*}_temp.${file##*.}"
				ffmpeg -i "$file" -map 0 -c copy -metadata "$tags"="" "$temp_file"
				mv "$temp_file" "$file"
			fi
		# Embed synced lyrics if available
		elif [ -e "${file%.*}.lrc" ]; then
			lyrics="${file%.*}.lrc"
			temp_file="${file%.*}_temp.${file##*.}"
			ffmpeg -i "$file" -map 0 -c copy -metadata "$tags"="$(cat "$lyrics")" "$temp_file"
			mv "$temp_file" "$file"
			rm "$lyrics"
		# Otherwise, just state it's empty
		else
			echo "'$file' has no lyrics"
		fi
	done
done

### Import all Music to Library
source "$ROOT_DIR/beets/bin/activate"
beet update
beet import "$IMPORT_DIR"
beet convert -y
deactivate

### Export (hopefully now Synced) Lyrics to .lrc file for Plex Media Server
cd "$LIBRARY_DIR"

for ext in ${extensions[@]}; do
	if [[ "$ext" == "flac" ]]; then
		tags="LYRICS"
	else
		tags="lyrics"
	fi
	
	for file in **/*."$ext"; do
		# Check if any exported lyrics exists, delete unsynced if both exists
		if [ -e "${file%.*}.lrc" ] && [ -e "${file%.*}.txt" ]; then
			rm "${file%.*}.txt"
		# If only unsynced exists
		elif [ -e "${file%.*}.txt" ]; then
			# Read Lyrics from file
			lyrics=$(ffprobe -loglevel error -show_entries format_tags="$tags" -of default=noprint_wrappers=1:nokey=1 "$file")		
			# Ensure lyrics aren't empty
			if ! [ -z "$lyrics" ]; then
				# If synced, delete unsynced and save synced
				if echo "$lyrics" | grep -qE '^\[[0-9]{2}:[0-9]{2}.[0-9]{2}\]'; then
					echo "Synced lyrices for '$file' found, removing unsynced."
					rm "${file%.*}.txt"
					echo "$lyrics" > "${file%.*}.lrc"
				else
					echo "No synced lyrics found, leaving files intact."
                    echo "$file" >> "unsynced.txt"
				fi
			fi
		# If no lyrics exists.
		elif [ ! -e "${file%.*}.txt" ] && [ ! -e "${file%.*}.lrc" ]; then
			# Read Lyrics from file
			lyrics=$(ffprobe -loglevel error -show_entries format_tags="$tags" -of default=noprint_wrappers=1:nokey=1 "$file")		
			# Ensure lyrics aren't empty
			if ! [ -z "$lyrics" ]; then
				# If synced, delete unsynced and save synced
				if echo "$lyrics" | grep -qE '^\[[0-9]{2}:[0-9]{2}.[0-9]{2}\]'; then
					echo "Exporting synced lyrics for '$file'."
					echo "$lyrics" > "${file%.*}.lrc"
				else
					echo "Exporting unsynced lyrics for '$file'."
					echo "$lyrics" > "${file%.*}.txt"
					echo "$file" >> "unsynced.txt"
				fi
			fi
		else
			echo "Lyrics already exists for '$file'"
		fi
	done
done

Alando2048 · October 5, 2024, 11:58pm

Back again with another update, transitioning to Python (still learning, eventually [hopefully] I’ll get to the point where this can be a plugin).

I’m leaving the old one up just in case anyone wants something that’ll run without python (though you’d still need ffmpeg and ffprobe).

Now I’m at a point where I have an import.sh file which calls a import.py script.

import.sh:

#!/bin/sh
# Get the directory of the script
IMPORT="~/Downloads/Music"
LIBRARY="~/Music"

# Embed synced lyrics into files before importing with beats
# Strip unsynced lyrics from files
python3 ./import.py -em -dir "$IMPORT"

# Import all Music to Library
source "$ROOT_DIR/beets/bin/activate"
beet update
beet import "$IMPORT"
beet convert -y
deactivate

# Export lyrics from media files now that they're tagged
python3 ./import.py -ex -dir "$LIBRARY"

import.py:

import os, glob, re, argparse
from mutagen import File, MutagenError

extensions = ['flac', 'mp3', 'm4a']
lyrics_tags = ['LYRICS', 'lyrics', '\xa9lyr']
extension_to_tag = {
    'flac': 'LYRICS',
    'mp3': 'lyrics',
    'm4a': '\xa9lyr'
}

def get_tag_for_extension(extension):
    return extension_to_tag.get(extension, None)

def find_tag(audio, lyrics_tags):
    for tag in lyrics_tags:
        try:
            if tag in audio:
                return tag
        except ValueError:
            continue  # Ignore the error and continue checking other tags
    return None

def find_audio(folder):
  print(f"Scanning '{folder}'")
  audio_files = []
  for ext in extensions:
    audio_files.extend(glob.glob(f'{folder}/**/*.{ext}', recursive=True))
  return audio_files
  

    
def embed_lyrics_from_file(filepath, audio, lrc_filename, tag):
    if os.path.isfile(lrc_filename):
        with open(lrc_filename, 'r', encoding='utf-8') as lrc_file:
            lyrics = lrc_file.read()
        if lyrics:
            audio[tag] = lyrics
            audio.save()
            print(f"Embedded synced lyrics in '{filepath}'")
        os.remove(lrc_filename)
    print(f"Deleted '{lrc_filename}' after embedding lyrics.")

def embed_lyrics(importdir):
    audio_files = find_audio(importdir)
    for filepath in audio_files:
        try:
            audio = File(filepath)
            if audio:
                lrc_filename = f"{os.path.splitext(filepath)[0]}.lrc"
                tag = find_tag(audio, lyrics_tags);
                if tag:
                    lyrics = audio.get(tag)
                    if lyrics:
                        lyrics_str = lyrics[0] if isinstance(lyrics, list) else lyrics
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics_str)):
                            print(f"Synced lyrics already in '{filepath}'")
                        else:
                            print(f"Stripping lyrics from '{filepath}'")
                            del audio[tag]
                            audio.save()
                            print(f"Stripped lyrics from '{filepath}'")
                    # No harm in embedding it
                    embed_lyrics_from_file(filepath, audio, lrc_filename, tag)
                else:
                    # Get the extension without the dot
                    ext = os.path.splitext(filepath)[1][1:]
                    tag = get_tag_for_extension(ext)
                    if tag:
                        embed_lyrics_from_file(filepath, audio, lrc_filename, tag)
                    else:
                        print(f"No tag found for '{filepath}' and no extension map available")
            else:
                print(f"Audio file not readable: '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")

def extract_lyrics(musicdir):
    audio_files = find_audio(musicdir)
    unsynced = []
    nolyrics = []
  
    for filepath in audio_files:
        try:
            audio = File(filepath)
            if audio:
                tag = find_tag(audio, lyrics_tags);
                if tag:
                    lyrics = audio.get(tag)
                    if lyrics:
                        lyrics_str = lyrics[0] if isinstance(lyrics, list) else lyrics
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics_str)):
                            lyrics_type = 'lrc'
                        else:
                            lyrics_type = 'txt'
                            unsynced.append(filepath)
                            output_filename = f"{os.path.splitext(filepath)[0]}.{lyrics_type}"
                            if not os.path.isfile(output_filename):
                                with open(output_filename, 'w', encoding='utf-8') as output:
                                    output.write(f"{lyrics_str}\n")
                    else:
                        print(f"No lyrics found for '{filepath}'")
                        nolyrics.append(filepath)
                else:
                    print(f"No lyrics tag found for '{filepath}'")
                    nolyrics.append(filepath)
            else:
                print(f"Could not read metadata for '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")
    if unsynced:
        print("There are some files with unsynced lyrics \nCheck 'unsynced.txt' for more.")
        
        with open('unsynced.txt', 'w') as output:
            for item in unsynced:
                output.write(f"{item}\n")
    if nolyrics:
        print("There are some files with no lyrics \nCheck 'nolyrics.txt' for more.")
        with open('nolyrics.txt', 'w') as output:
            for item in nolyrics:
                output.write(f"{item}\n")

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-em', action='store_true')
    parser.add_argument('-ex', action='store_true')
    parser.add_argument('-dir', type=str, required=True)
    args = parser.parse_args()
    
    directory = os.path.expanduser(args.dir)
  
    if args.em:
        print(f"Embedding lyrics in '{directory}'")
        embed_lyrics(directory)
    elif args.ex:
        print(f"Extracting lyrics in '{directory}'")
        extract_lyrics(directory)
    else:
        print("No valid option provided.")
    
if __name__ == "__main__":
    main()

Script is set up to either embed or extract lyrics from audio files using the mutagen library, with the option to process multiple audio file types (flac, m4a, mp3). It’s driven by command-line arguments, where -em triggers embedding lyrics, and -ex triggers extracting lyrics -dir "string" specifies the directory to use.

It also puts a list of files without lyrics and unsynced lyrics next to the script itself.

Alando2048 · October 8, 2024, 1:02am

Updated the python code to work around some errors I’ve encountered and more output for debugging.

import.sh

#!/bin/sh
# Get the directory of the script
IMPORT="~/Downloads/Music"
LIBRARY="~/Music"

# Embed synced lyrics into files before importing with beats
# Strip unsynced lyrics from files
python3 ./import.py -em -dir "$IMPORT"

# Import all Music to Library
source "$ROOT_DIR/beets/bin/activate"
beet update
beet import "$IMPORT"
beet convert -y
deactivate

# Export lyrics from media files now that they're tagged
python3 ./import.py -ex -dir "$LIBRARY"

import.py

import os, glob, re, argparse
from mutagen import File, MutagenError

extensions = ['flac', 'mp3', 'm4a']
lyrics_tags = ['LYRICS', 'lyrics', '\xa9lyr']
extension_to_tag = {
    'flac': 'LYRICS',
    'mp3': 'lyrics',
    'm4a': '\xa9lyr'
}

def get_tag_for_extension(extension):
    return extension_to_tag.get(extension, None)

def find_tag(audio, lyrics_tags):
    for tag in lyrics_tags:
        try:
            if tag in audio:
                return tag
        except ValueError:
            continue  # Ignore the error and continue checking other tags
    return None

def find_audio(folder):
  print(f"Scanning '{folder}'")
  audio_files = []
  for ext in extensions:
    audio_files.extend(glob.glob(f'{folder}/**/*.{ext}', recursive=True))
  return audio_files
  

    
def embed_lyrics_from_file(filepath, audio, lrc_filename, tag):
    if os.path.isfile(lrc_filename):
        with open(lrc_filename, 'r', encoding='utf-8') as lrc_file:
            lyrics = lrc_file.read()
        if lyrics:
            audio[tag] = lyrics
            audio.save()
            print(f"Embedded synced lyrics in '{filepath}'")
        os.remove(lrc_filename)
    print(f"Deleted '{lrc_filename}' after embedding lyrics.")

def embed_lyrics(importdir):
    audio_files = find_audio(importdir)
    for filepath in audio_files:
        try:
            audio = File(filepath)
            if audio:
                #Remove Unsynced Lyrics
                txt_filename = f"{os.path.splitext(filepath)[0]}.txt"
                if os.path.exists(txt_filename):
                    os.remove(txt_filename)
                # Proceed to scan and embed Synced Lyrics
                lrc_filename = f"{os.path.splitext(filepath)[0]}.lrc"
                tag = find_tag(audio, lyrics_tags);
                if tag:
                    lyrics = audio.get(tag)
                    if lyrics:
                        lyrics_str = lyrics[0] if isinstance(lyrics, list) else lyrics
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics_str)):
                            print(f"Synced lyrics already in '{filepath}'")
                        else:
                            print(f"Stripping lyrics from '{filepath}'")
                            del audio[tag]
                            audio.save()
                            print(f"Stripped lyrics from '{filepath}'")
                    # No harm in embedding it
                    embed_lyrics_from_file(filepath, audio, lrc_filename, tag)
                else:
                    # Get the extension without the dot
                    ext = os.path.splitext(filepath)[1][1:]
                    tag = get_tag_for_extension(ext)
                    if tag:
                        embed_lyrics_from_file(filepath, audio, lrc_filename, tag)
                    else:
                        print(f"No tag found for '{filepath}' and no extension map available")
            else:
                print(f"Audio file not readable: '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")

def extract_lyrics(musicdir):
    audio_files = find_audio(musicdir)
    unsynced = []
    nolyrics = []
  
    for filepath in audio_files:
        try:
            audio = File(filepath)
            if audio:
                tag = find_tag(audio, lyrics_tags);
                if tag:
                    lyrics = audio.get(tag)
                    if lyrics:
                        lyrics_str = lyrics[0] if isinstance(lyrics, list) else lyrics
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics_str)):
                            lyrics_type = 'lrc'
                        else:
                            lyrics_type = 'txt'
                            unsynced.append(filepath)
                        output_filename = f"{os.path.splitext(filepath)[0]}.{lyrics_type}"
                        if not os.path.isfile(output_filename):
                            print(f"Extracting lyrics from '{filepath}'")
                            with open(output_filename, 'w', encoding='utf-8') as output:
                                output.write(f"{lyrics_str}\n")
                else:
                    print(f"No lyrics tag found for '{filepath}'")
                    nolyrics.append(filepath)
            else:
                print(f"Could not read metadata for '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")
    if unsynced:
        print("There are some files with unsynced lyrics \nCheck 'unsynced.txt' for more.")
        
        with open('unsynced.txt', 'w') as output:
            for item in unsynced:
                output.write(f"{item}\n")
    if nolyrics:
        print("There are some files with no lyrics \nCheck 'nolyrics.txt' for more.")
        with open('nolyrics.txt', 'w') as output:
            for item in nolyrics:
                output.write(f"{item}\n")

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-em', action='store_true')
    parser.add_argument('-ex', action='store_true')
    parser.add_argument('-dir', type=str, required=True)
    args = parser.parse_args()
    
    directory = os.path.expanduser(args.dir)
  
    if args.em:
        print(f"Embedding lyrics in '{directory}'")
        embed_lyrics(directory)
    elif args.ex:
        print(f"Extracting lyrics in '{directory}'")
        extract_lyrics(directory)
    else:
        print("No valid option provided.")
    
if __name__ == "__main__":
    main()

Alando2048 · October 11, 2024, 8:44am

Got it running with no errors thus far. Special thanks to @Casual_Tea, I used some of your code to debug some issues when extracting lyrics from MP3 files… Some were simply embedded with lrc content (which is what beets does), some were tagged properly (SYLT format, not supported by majority of the well-known taggers).

Most errors were thrown due to wonky implementation in MP3, so far all have seemed to be resolved.

import.sh

#!/bin/bash
# Get the directory of the script
ROOT_DIR="$(dirname "$(realpath "$0")")"
IMPORT="~/Downloads/Music"
LIBRARY="~/Music"

python3 ./import.py -em -d "$IMPORT"

### Import all Music to Library
source "$ROOT_DIR/beets/bin/activate"
beet update
beet import "$IMPORT"
beet convert -y
deactivate

python3 ./import.py -ex -d "$LIBRARY"

import.py

import os, glob, re, argparse

from mutagen import MutagenError
from mutagen.id3 import ID3, USLT, SYLT, Encoding, ID3NoHeaderError
from mutagen.mp3 import MP3
from mutagen.flac import FLAC
from mutagen.mp4 import MP4

extensions = ['flac', 'mp3', 'm4a']
lyrics_tags = ['LYRICS', 'lyrics', '\xa9lyr']
mp3_frames = [SYLT, USLT]

extension_to_tag = {
    'flac': 'LYRICS',
    'mp3': USLT, ## Mapping MP3 to USLT as SYLT support is wonky in FFMPEG and MP3TAG, embeds in the SYLT format where possible
    'm4a': '\xa9lyr'
}

# 1 - Find Audio
def find_audio(folder):
    print(f"Scanning '{folder}'")
    audio_files = []
    for ext in extensions:
        audio_files.extend(glob.glob(f'{folder}/**/*.{ext}', recursive=True))
    return audio_files

# 2 - Read Lyrics File (if any)
def read_lyrics_file(filepath):
    if os.path.isfile(filepath):
        print(f"Local lyrics found: '{filepath}'")
        with open(filepath, 'r', encoding='utf-8') as file:
            return file.read()
    else:
        print(f"No local lyrics file found.")
        
# 3 - Initialize? it
def init_audio(filepath, ext):
    audio = None
    if ext == 'mp3':
        try:
            audio = MP3(filepath, ID3=ID3)
        except ID3NoHeaderError:
            audio = MP3(filepath)
            audio.add_tags()
    elif ext == 'm4a':
        audio = MP4(filepath)
    elif ext == 'flac':
        audio = FLAC(filepath)
    return audio
    
# 4 - look for tags in the audio
def find_tag(audio, ext):
    if ext == 'mp3':
        for tag in audio.tags.values():
            for frame in mp3_frames:
                if isinstance(tag, frame):
                    return tag
    else:
        for tag in lyrics_tags:
            try:
                if tag in audio:
                    return tag
            except ValueError:
                continue  # Ignore the error and continue checking other tags
    # If no tags found, state and set one based on ext
    print("No tags found in file, setting based on extension.")
    return extension_to_tag[ext]

# NOT USABLE FOR BEETS
'''
# 5 - Be able to parse lyrics
## START CREDIT: Casual_Tea @ discourse.beets.io
def lrc_to_sylt(lyrics):
    sylt_lyrics = []
    timestamp_pattern = re.compile(r'^\[(\d{1,2}):(\d{2})\.(\d{2})\] *(.*)$')
    lines = lyrics.split('\n')
    for line in lines:
        match = timestamp_pattern.match(line)
        if match:
            minutes, seconds, milliseconds, text = match.groups()
            timestamp = (int(minutes) * 60000) + (int(seconds) * 1000) + int(milliseconds.ljust(3, '0'))
            sylt_lyrics.append((text, timestamp))
    return sorted(sylt_lyrics, key=lambda x: x[1])
'''
def sylt_to_lrc(sylt_frame):
    lrc_lines = []
    for text, timestamp in sylt_frame:
        minutes = timestamp // 60000
        seconds = (timestamp % 60000) / 1000
        lrc_timestamp = f"[{minutes:02}:{seconds:05.2f}]"
        lrc_line = f"{lrc_timestamp}{text}"
        lrc_lines.append(lrc_line)
    lrc_lines.sort()
    lrc_content = "\n".join(lrc_lines)
    return lrc_content
## END CREDIT: Casual_Tea @ discourse.beets.io

# 6 - Try to get lyrics from the file itself
def read_lyrics_audio(audio, ext, tag):
    try:
        if ext == 'mp3':
            for val in audio.tags.values():
                if isinstance(val, USLT):
                    lyrics = val.text
                elif isinstance(val, SYLT):
                    return sylt_to_lrc(val)
        else:
            lyrics = audio.get(tag)
        return lyrics[0] if isinstance(lyrics, list) else lyrics
    except Exception as e:
        print(f"Error reading lyrics from audio file: {e}")
        return None

# 7 - Be able to strip all
def strip_tags(audio, ext):
    if ext == 'mp3':
        for tag in mp3_frames:
            try:
                audio.tags.delall(tag.__name__)
            except (KeyError, ValueError):
                continue
    else:
        for tag in lyrics_tags:
            try:
                del audio[tag]
            except (KeyError, ValueError):
                continue

# 8 - Embed from file
def embed_lyrics_from_file(audio, ext, tag, lrc_filepath):
    lyrics = read_lyrics_file(lrc_filepath)
    if lyrics:
        if ext == 'mp3':
            strip_tags(audio, ext)
            ## THESE ARE IRRELEVANT FOR BEETS, leaving because it's more "correct"
            ### lyrics_str = lrc_to_sylt(lyrics)
            ### audio.tags.setall("USLT", [SYLT(encoding=Encoding.UTF8, lang='eng', format=2, type=1, text=lyrics_str)])
            audio.tags.add(USLT(encoding=Encoding.UTF8, lang='eng', format=2, type=1, text=lyrics))
            audio.save(v2_version=3)
        else:
            audio[tag] = lyrics
            audio.save()
        os.remove(lrc_filepath)
        print(f"Lyrics embeded. Deleted '{lrc_filepath}' after embedding lyrics.")

# MAIN FUNCTIONS #
def embed(folder):
    audio_files = find_audio(folder)
    for filepath in audio_files:
        print(f"Scanning '{filepath}'")
        try:
            ext = os.path.splitext(filepath)[1][1:]
            audio = init_audio(filepath, ext)
            if audio:
                print(f"Initialized '{filepath}'")
                lrc_filepath = f"{os.path.splitext(filepath)[0]}.lrc"
                txt_filepath = f"{os.path.splitext(filepath)[0]}.txt"
                tag = find_tag(audio, ext)
                if tag:
                    lyrics = read_lyrics_audio(audio, ext, tag)
                    if lyrics:
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics)):
                            print(f"Synced lyrics already in '{filepath}'")
                        else:
                            print(f"No synced lyrics embedded, stripping for autotagging")
                            strip_tags(audio, ext)
                        ## Maybe local file has been updated, embed anyways
                        embed_lyrics_from_file(audio, ext, tag, lrc_filepath)
                    else:
                        embed_lyrics_from_file(audio, ext, tag, lrc_filepath)
                else:
                    print(f"No embedded or usable tag found for '{filepath}'")
            else:
                print(f"Audio file not readable: '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")

def extract(folder):
    audio_files = find_audio(folder)
    unsynced = []
    nolyrics = []
  
    for filepath in audio_files:
        try:
            ext = os.path.splitext(filepath)[1][1:]
            audio = init_audio(filepath, ext)
            if audio:
                print(f"Initialized '{filepath}'")
                tag = find_tag(audio, ext)
                if tag:
                    lyrics = read_lyrics_audio(audio, ext, tag)
                    if lyrics:
                        if bool(re.search(r'\[\d{2}:\d{2}\.\d{2}\]|\[\d{1,2}\.\d{2}\]', lyrics)):
                            lyrics_type = 'lrc'
                        else:
                            lyrics_type = 'txt'
                            unsynced.append(filepath)
                        output_filepath = f"{os.path.splitext(filepath)[0]}.{lyrics_type}"
                        if not os.path.isfile(output_filepath):
                            print(f"Extracting lyrics from '{filepath}'")
                            with open(output_filepath, 'w', encoding='utf-8') as output:
                                output.write(f"{lyrics}\n")
                else:
                    print(f"No lyrics tag found for '{filepath}'")
                    nolyrics.append(filepath)
            else:
                print(f"Could not read metadata for '{filepath}'")
        except MutagenError as e:
            print(f"Error processing '{filepath}': {e}")
    if unsynced:
        print("There are some files with unsynced lyrics \nCheck 'unsynced.txt' for more.")
        with open('unsynced.txt', 'w') as output:
            for item in unsynced:
                output.write(f"{item}\n")
    if nolyrics:
        print("There are some files with no lyrics \nCheck 'nolyrics.txt' for more.")
        with open('nolyrics.txt', 'w') as output:
            for item in nolyrics:
                output.write(f"{item}\n")

### MAIN FUNCTION ###
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-em', action='store_true')
    parser.add_argument('-ex', action='store_true')
    parser.add_argument('-d', type=str, required=True)
    args = parser.parse_args()
    
    dir = os.path.expanduser(args.d)
  
    if args.em:
        print(f"Embedding lyrics in '{dir}'")
        embed(dir)
    elif args.ex:
        print(f"Extracting lyrics in '{dir}'")
        extract(dir)
    else:
        print("No valid option provided.")
    
if __name__ == "__main__":
    main()

Topic		Replies	Views
Download lyrics as separate file Help	3	1487	March 15, 2021
Importing lyrics from text files Help	4	681	April 13, 2017
Lyrics not appearing in MediaMonkey (or not embedding at all?) Help	11	1074	November 5, 2020
Looking up any folder for lyrics? Help	2	416	June 27, 2020
Extrafiles vs. copyartifacts plugin or something else for lyrics? Help	10	1504	January 7, 2024

What does beets do with already existing .lrc files?

Related topics