What's new
LiteRECORDS

Register a free account today to become a member! Once signed in, you'll be able to participate on this site by adding your own topics and posts, as well as connect with other members through your own private inbox!

  • Guest, before your account can be reviewed you must click the activation link sent to your email account. Please ensure you check your junk folders.
    If you do not see the link after 24 hours please open a support ticket.

Powerful New Vocal Remover AI - Instructions

Awesome!! I'm glad it worked! The multi-genre model I uploaded is much better than the original base model. However, I'm going to be coming out with an even better one this week. So far the new one I'm making now is outperforming the one I posted.

I'm looking forward to the update...
 
Last edited:
i downloaded the new baseline.....how do u get it to activate or batch process or using a genre process or does it recognize what type of music it is....im lost lol
 
tried to train but got this error at the end

1 +- 03_bill_mix.mp3 +- 03_bill_inst.mp3
2 +- 04_fasc_mix.mp3 +- 04_fasc_inst.mp3
3 +- 01_amd_mix.mp3 +- 01_amd_inst.mp3
4 +- 02_beat_mix.mp3 +- 02_beat_inst.mp3
0%| | 0/4 [00:00<?, ?it/s]C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:53<00:00, 28.25s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 223, in <module>
main()
File "train.py", line 194, in main
X_train, y_train, model, optimizer, args.batchsize, instance_loss)
File "train.py", line 75, in train_inner_epoch
return sum_loss / len(X_train)
ZeroDivisionError: division by zero
 
tried to train but got this error at the end

1 +- 03_bill_mix.mp3 +- 03_bill_inst.mp3
2 +- 04_fasc_mix.mp3 +- 04_fasc_inst.mp3
3 +- 01_amd_mix.mp3 +- 01_amd_inst.mp3
4 +- 02_beat_mix.mp3 +- 02_beat_inst.mp3
0%| | 0/4 [00:00<?, ?it/s]C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:53<00:00, 28.25s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 223, in <module>
main()
File "train.py", line 194, in main
X_train, y_train, model, optimizer, args.batchsize, instance_loss)
File "train.py", line 75, in train_inner_epoch
return sum_loss / len(X_train)
ZeroDivisionError: division by zero

This error is due to your training set being too small. You need a bare minimum of 15 pairs in order to start training. Also, if you're training from scratch like this you'll need at LEAST 50-75 pairs for it to be effective at all. Your training/validation numbers won't move with sets any lower than 50; You'll end up wasting your system resources and being sorely disappointed with your models' performance.

If you choose to train with a set between 15-50 pairs, just finetune one of the baseline models (commands in the main thread). I figured out how to train effectively with a GPU, so train with your GPU if you have one.
 
Last edited:
A new model has been posted to the main page! Please make sure to use it with the new A.I. provided as it won't work with the old one.
 
Hey Anjok! First of all thank you for this awesome AI, it works really well and does a great job separating the tracks.
But now I have a problem with the new model uploaded.
When I tried to run using GPU I get the following error:
Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 64, in main
pred = model.predict(X_window)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 79, in predict
h = self.full_band_net(self.bridge(h))
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 34, in __call__
h = self.dec1(h, e1)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\layers.py", line 79, in __call__
x = spec_utils.crop_center(x, skip)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\spec_utils.py", line 20, in crop_center
return torch.cat([h1, h2], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 2.00 GiB total capacity; 948.49 MiB already allocated; 308.74 MiB free; 137.51 MiB cached)
This didn't happened with the old version. There's a way to solve this? Because using CPU is reaaaally slow. Thank you!
 
GPU not much cop mate.....
so how does it work after you have trained it
does it recognize wether its a rock song etc
 
Hey Anjok! First of all thank you for this awesome AI, it works really well and does a great job separating the tracks.
But now I have a problem with the new model uploaded.
When I tried to run using GPU I get the following error:

This didn't happened with the old version. There's a way to solve this? Because using CPU is reaaaally slow. Thank you!

You're welcome! I'm glad you've enjoyed it!

To answer your question, this new model is bigger and has more layers, so is requires more V-RAM. Your GPU might not have enough memory for this one sadly :(
 
I made it to the conversion step and then got an error I can't figure out;

C:\Users\xxxx\Documents\vocal-remover>python inference.py --input Daredevil.mp3 --gpu 0
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
loading model... done
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
loading wave source... Traceback (most recent call last):
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 129, in load
with sf.SoundFile(path) as sf_desc:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'Daredevil.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 39, in main
args.input, args.sr, False, dtype=np.float32, res_type='kaiser_fast')
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 162, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 186, in __audioread_load
with audioread.audio_open(path) as input_file:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\audioread\__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
 
Yep, that fixed it! Thanks :) I also needed to use wav files (mp3's won't work, at least not on my system).

I threw a couple of the toughest conversions that I know of at it to see. On the whole, the primary AI over in the other topic is superior, at least presently, in that this one has more trace vocal across the tracks, like a lingering echo instead of the static the other one gives ... but both are in the same ballpark, which is way ahead of any other program.

One fascinating thing though, is that this one actually seems to handle some things BETTER than the other.. though I'd need to do more testing to see. I'd say this is the exception, not the rule ... but... For example, Steven Wilson - Blackest Eyes it does a poorer job of the verse sections, but does a superior job on the bridge. On Smashing Pumpkins - JellyBelly it does a poorer job on the overall vocals, since there is trace bleed here, and the other AI eliminates it entirely, ... but on certain parts of the song, the other AI completely fails to remove any vocals at all, and this one does not do a perfect job by any stretch, but it does a noteworthy better job on those parts.

I've had very little time to test, and I know that there are more builds to come (which I look forward to), just very early observations. Genre specific models fascinate me as well ... what if that one part in a Rock song converts better with a Pop oriented model and can be sliced in with the rest of the song converted with the Rock model to create a complete product? I'm already getting that vibe just comparing this model with the other one. Even failing that, more diverse coverage of quality results is likely.
 
i eventually got it working but it must hog everything on laptop.........chucked a halestorm song at it .........15 and a half hours.........no chance.......so threw it into my sons gaming desktop.........16 seconds later the song was done

PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SLFNEW.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 43/43 [00:18<00:00, 2.30it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SMITH.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 69/69 [00:28<00:00, 2.39it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input ZZYZX.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 54/54 [00:22<00:00, 2.35it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input HILL.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00, 2.11it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input CHAOS.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [00:18<00:00, 2.20it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input JAD.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 39/39 [00:17<00:00, 2.29it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2>
 
Last edited:
i eventually got it working but it must hog everything on laptop.........chucked a halestorm song at it .........15 and a half hours.........no chance.......so threw it into my sons gaming desktop.........16 seconds later the song was done

PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SLFNEW.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 43/43 [00:18<00:00, 2.30it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SMITH.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 69/69 [00:28<00:00, 2.39it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input ZZYZX.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 54/54 [00:22<00:00, 2.35it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input HILL.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00, 2.11it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input CHAOS.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [00:18<00:00, 2.20it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input JAD.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 39/39 [00:17<00:00, 2.29it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2>

How good were the results?
 
Personally I'd like to see a GUI with anything current and then a way to simply add the updates later by either certain file types or a simple update command tied to the git-hub.
 
How good were the results?

compared to other programs and based against the earlier version..........v2 is smoking
only draw back is you defo need a top notch PC to get things done fast......
if this is based on 300 odd pairs...........the 1000 pair edition that Anjok is maybe gonna release is gonna be immense
think i maybe need to rob a bank for a new pc
 
compared to other programs and based against the earlier version..........v2 is smoking
only draw back is you defo need a top notch PC to get things done fast......
if this is based on 300 odd pairs...........the 1000 pair edition that Anjok is maybe gonna release is gonna be immense
think i maybe need to rob a bank for a new pc

Yes I also need another computer but for now I think I'm gonna dedicate my old laptop to just doing these conversions
 
Last edited:
Genre specific models fascinate me as well ... what if that one part in a Rock song converts better with a Pop oriented model and can be sliced in with the rest of the song converted with the Rock model to create a complete product? I'm already getting that vibe just comparing this model with the other one. Even failing that, more diverse coverage of quality results is likely.

This is actually something I'm testing now! I had to give my PC a break from training for a bit because I didn't want to burn it out.. I'm almost done building a new one that going to be 100x more powerful. Once I get my remaining parts in the mail, I'm going to start training aggressively with new settings and different batch sizes.
 
Last edited: