When i ran an import against my library it ate up all 256G of ram, and after 2 hours crashed. Import per directory works though. IS there a method to throttle import to chunks ? or will a for i loop script for each dir accomplish this ?
There is already a certain amount of throttling since the number of pending tasks for each stage of the importer is limited (to a fairly large value). I don’t see how this number of tasks could use as much RAM. So additional throttling is probably not solving the problem. More likely something (a plugin?) is keeping references to the tasks beyond the import process. Anyway, I’m just speculating here without further details. You might want to try disabling plugins and see whether the issue persists. It is generally a good practice to post at least your beets config when reporting such issues.
The for-loop would work too, of course.
beet config
directory: /cloud/music
library: ~/data/musiclibrary.db
threaded: yes
import:
copy: no
write: yes
move: yes
autotag: yes
log: ~/beetslog.txt
incremental: yes
quiet: yes
original_date: yes
per_disc_numbering: yes
embedart:
auto: yes
art_filename: albumart
plugins: mbcollection inline fetchart lastgenre rewrite fromfilename bucket
mbcollection:
auto: yes
collection: library
remove: no
pluginpath: ~/data/
ui:
color: yes
paths:
default: $albumartist/$album/$track $title
singleton: $albumartist/$artist - $title
comp: $albumartist/$album/$track $title
albumtype:soundtrack: Soundtracks/$album/$track $title
duplicate_action: keep
musicbrainz:
user: nnn
pass: REDACTED
auto: yes
collection: library
fetchart:
auto: yes
cautious: yes
sources: filesystem coverart itunes amazon albumart
minwidth: 0
maxwidth: 0
quality: 0
enforce_ratio: no
cover_names:
- cover
- front
- art
- album
- folder
google_key: REDACTED
google_engine: 001442825323518660753:hrh5ch1gjzm
fanarttv_key: REDACTED
lastfm_key: REDACTED
store_source: no
high_resolution: no
lastgenre:
auto: yes
source: album
whitelist: yes
min_weight: 10
count: 1
fallback:
canonical: no
force: yes
separator: ', '
prefer_specific: no
title_case: yes
replace:
'[\\/]': _
^\.: _
'[\x00-\x1f]': _
'[<>:"\?\*\|]': _
\.$: _
edit:
itemfields:
- album
- albumartist
- artist
- track
- title
- year
albumfields:
- albumartist
- album
- year
- albumtype
match:
strong_rec_thresh: 0.04
medium_rec_thresh: 0.25
rec_gap_thresh: 0.25
max_rec:
source: strong
artist: strong
album: strong
media: strong
mediums: strong
year: strong
country: strong
label: strong
catalognum: strong
albumdisambig: strong
album_id: strong
tracks: strong
missing_tracks: medium
unmatched_tracks: medium
track_title: strong
track_artist: strong
track_index: strong
track_length: strong
track_id: strong
chroma:
auto: no
bucket:
bucket_alpha:
- _
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z
bucket_alpha_regex:
_: ^[^A-Z]
bucket_year: []
extrapolate: no
pathfields: {}
item_fields: {}
album_fields: {}
rewrite: {}
I had a very brief look at the code of those plugins and didn’t see any obvious way that they would leak memory. If you want to debug this further, I’d nevertheless suggest to attempt a large import with all plugins disabled in order to narrow down on the culprit.
I guess that by far the easiest way forward for you would be the for-loop.
Some more thoughts on debugging: If it’s not one of the plugins, I think the next step forward would be to build a test case that generates such huge import sessions (maybe stub out importer.read_tasks
instead of actually generating thousands or millions of files?) and then hook up Python’s tracemalloc
in the album_imported
event to detect the site where the leaking memory is allocated. Such a test case would be nice to have in general, so I did put it on my todo-list, but I don’t think I’ll look into it anytime soon.