Powerful New Vocal Remover AI - Instructions

chilinvilin · May 6, 2020

Anjok said:
Awesome!! I'm glad it worked! The multi-genre model I uploaded is much better than the original base model. However, I'm going to be coming out with an even better one this week. So far the new one I'm making now is outperforming the one I posted.

I'm looking forward to the update...

chilinvilin · May 6, 2020

One more for tonight..

Ozzy Osbourne - Believer (Source Track 96.0 kHz Sample Rate FLAC) https://www.mediafire.com/file/cwc2wolk3meunop/Believer.flac
Ozzy Osbourne - Believer (Instrumental) https://www.mediafire.com/file/lv2o0c8ntmlisup/Believer_Instrumental.mp3
Ozzy Osbourne - Believer (Acapella) https://www.mediafire.com/file/v06jh3mwjdpx10s/Believer_Vocal.mp3

rkeane · May 10, 2020

i downloaded the new baseline.....how do u get it to activate or batch process or using a genre process or does it recognize what type of music it is....im lost lol

rkeane · May 10, 2020

tried to train but got this error at the end

1 +- 03_bill_mix.mp3 +- 03_bill_inst.mp3
2 +- 04_fasc_mix.mp3 +- 04_fasc_inst.mp3
3 +- 01_amd_mix.mp3 +- 01_amd_inst.mp3
4 +- 02_beat_mix.mp3 +- 02_beat_inst.mp3
0%| | 0/4 [00:00<?, ?it/s]C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:53<00:00, 28.25s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 223, in <module>
main()
File "train.py", line 194, in main
X_train, y_train, model, optimizer, args.batchsize, instance_loss)
File "train.py", line 75, in train_inner_epoch
return sum_loss / len(X_train)
ZeroDivisionError: division by zero

Anjok · May 10, 2020

rkeane said:
tried to train but got this error at the end

1 +- 03_bill_mix.mp3 +- 03_bill_inst.mp3
2 +- 04_fasc_mix.mp3 +- 04_fasc_inst.mp3
3 +- 01_amd_mix.mp3 +- 01_amd_inst.mp3
4 +- 02_beat_mix.mp3 +- 02_beat_inst.mp3
0%| | 0/4 [00:00<?, ?it/s]C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
C:\Users\Robert\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:53<00:00, 28.25s/it]
0it [00:00, ?it/s]
# epoch 0
* inner epoch 0
Traceback (most recent call last):
File "train.py", line 223, in <module>
main()
File "train.py", line 194, in main
X_train, y_train, model, optimizer, args.batchsize, instance_loss)
File "train.py", line 75, in train_inner_epoch
return sum_loss / len(X_train)
ZeroDivisionError: division by zero

This error is due to your training set being too small. You need a bare minimum of 15 pairs in order to start training. Also, if you're training from scratch like this you'll need at LEAST 50-75 pairs for it to be effective at all. Your training/validation numbers won't move with sets any lower than 50; You'll end up wasting your system resources and being sorely disappointed with your models' performance.

If you choose to train with a set between 15-50 pairs, just finetune one of the baseline models (commands in the main thread). I figured out how to train effectively with a GPU, so train with your GPU if you have one.

Anjok · May 11, 2020

A new model has been posted to the main page! Please make sure to use it with the new A.I. provided as it won't work with the old one.

halofan253 · May 11, 2020

Hey Anjok! First of all thank you for this awesome AI, it works really well and does a great job separating the tracks.
But now I have a problem with the new model uploaded.
When I tried to run using GPU I get the following error:

Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 64, in main
pred = model.predict(X_window)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 79, in predict
h = self.full_band_net(self.bridge(h))
File "C:\Users\KennA\Documents\vocal-removerV2\lib\nets.py", line 34, in __call__
h = self.dec1(h, e1)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\layers.py", line 79, in __call__
x = spec_utils.crop_center(x, skip)
File "C:\Users\KennA\Documents\vocal-removerV2\lib\spec_utils.py", line 20, in crop_center
return torch.cat([h1, h2], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 2.00 GiB total capacity; 948.49 MiB already allocated; 308.74 MiB free; 137.51 MiB cached)

This didn't happened with the old version. There's a way to solve this? Because using CPU is reaaaally slow. Thank you!

rkeane · May 11, 2020

GPU not much cop mate.....
so how does it work after you have trained it
does it recognize wether its a rock song etc

Anjok · May 11, 2020

halofan253 said:
Hey Anjok! First of all thank you for this awesome AI, it works really well and does a great job separating the tracks.
But now I have a problem with the new model uploaded.
When I tried to run using GPU I get the following error:

This didn't happened with the old version. There's a way to solve this? Because using CPU is reaaaally slow. Thank you!

You're welcome! I'm glad you've enjoyed it!

To answer your question, this new model is bigger and has more layers, so is requires more V-RAM. Your GPU might not have enough memory for this one sadly

ChrisCall · May 11, 2020

I made it to the conversion step and then got an error I can't figure out;

C:\Users\xxxx\Documents\vocal-remover>python inference.py --input Daredevil.mp3 --gpu 0
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
loading model... done
C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py:161: UserWarning: PySoundFile failed. Trying audioread instead.
warnings.warn('PySoundFile failed. Trying audioread instead.')
loading wave source... Traceback (most recent call last):
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 129, in load
with sf.SoundFile(path) as sf_desc:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 629, in __init__
self._file = self._open(file, mode_int, closefd)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 1184, in _open
"Error opening {0!r}: ".format(self.name))
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'Daredevil.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference.py", line 104, in <module>
main()
File "inference.py", line 39, in main
args.input, args.sr, False, dtype=np.float32, res_type='kaiser_fast')
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 162, in load
y, sr_native = __audioread_load(path, offset, duration, dtype)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\librosa\core\audio.py", line 186, in __audioread_load
with audioread.audio_open(path) as input_file:
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python37\lib\site-packages\audioread\__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError

djtayz · May 12, 2020

ChrisCall said:
I made it to the conversion step and then got an error I can't figure out;

...

Happened to me too. I wrote pip3 install numba==0.48.0 (or change it to 0.50.0) and I think that fixed it for me .

ChrisCall · May 12, 2020

Yep, that fixed it! Thanks

I also needed to use wav files (mp3's won't work, at least not on my system).

I threw a couple of the toughest conversions that I know of at it to see. On the whole, the primary AI over in the other topic is superior, at least presently, in that this one has more trace vocal across the tracks, like a lingering echo instead of the static the other one gives ... but both are in the same ballpark, which is way ahead of any other program.

One fascinating thing though, is that this one actually seems to handle some things BETTER than the other.. though I'd need to do more testing to see. I'd say this is the exception, not the rule ... but... For example, Steven Wilson - Blackest Eyes it does a poorer job of the verse sections, but does a superior job on the bridge. On Smashing Pumpkins - JellyBelly it does a poorer job on the overall vocals, since there is trace bleed here, and the other AI eliminates it entirely, ... but on certain parts of the song, the other AI completely fails to remove any vocals at all, and this one does not do a perfect job by any stretch, but it does a noteworthy better job on those parts.

I've had very little time to test, and I know that there are more builds to come (which I look forward to), just very early observations. Genre specific models fascinate me as well ... what if that one part in a Rock song converts better with a Pop oriented model and can be sliced in with the rest of the song converted with the Rock model to create a complete product? I'm already getting that vibe just comparing this model with the other one. Even failing that, more diverse coverage of quality results is likely.

chilinvilin · May 12, 2020

Got the new AI up and running and decided on Gerry Rafferty's Baker Street and what a awesome job it did. Some instruments ended up on the vocal track but I just put them back in the instrumental track. This conversion took three hours for completion..

Instrumental
https://www.mediafire.com/file/qtla5mnxeyucn9f/BSGR_Instruments.mp3

rkeane · May 12, 2020

i eventually got it working but it must hog everything on laptop.........chucked a halestorm song at it .........15 and a half hours.........no chance.......so threw it into my sons gaming desktop.........16 seconds later the song was done

PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SLFNEW.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 43/43 [00:18<00:00, 2.30it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SMITH.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 69/69 [00:28<00:00, 2.39it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input ZZYZX.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 54/54 [00:22<00:00, 2.35it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input HILL.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00, 2.11it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input CHAOS.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [00:18<00:00, 2.20it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input JAD.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 39/39 [00:17<00:00, 2.29it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2>

chilinvilin · May 13, 2020

rkeane said:
i eventually got it working but it must hog everything on laptop.........chucked a halestorm song at it .........15 and a half hours.........no chance.......so threw it into my sons gaming desktop.........16 seconds later the song was done

PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SLFNEW.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 43/43 [00:18<00:00, 2.30it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input SMITH.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 69/69 [00:28<00:00, 2.39it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input ZZYZX.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 54/54 [00:22<00:00, 2.35it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input HILL.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00, 2.11it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input CHAOS.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 40/40 [00:18<00:00, 2.20it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2> python inference.py --input JAD.wav --gpu 0
loading model... done
loading wave source... done
stft of wave source... done
100%|██████████████████████████████████████████████████████████████████████████████████| 39/39 [00:17<00:00, 2.29it/s]
inverse stft of instruments... done
inverse stft of vocals... done
PS C:\Users\PC\Documents\vocal-removerV2>

How good were the results?

patrik · May 13, 2020

Oh god my head is spinning...

NewAgeRipper · May 13, 2020

Personally I'd like to see a GUI with anything current and then a way to simply add the updates later by either certain file types or a simple update command tied to the git-hub.

rkeane · May 13, 2020

chilinvilin said:
How good were the results?

compared to other programs and based against the earlier version..........v2 is smoking
only draw back is you defo need a top notch PC to get things done fast......
if this is based on 300 odd pairs...........the 1000 pair edition that Anjok is maybe gonna release is gonna be immense
think i maybe need to rob a bank for a new pc

chilinvilin · May 13, 2020

rkeane said:
compared to other programs and based against the earlier version..........v2 is smoking
only draw back is you defo need a top notch PC to get things done fast......
if this is based on 300 odd pairs...........the 1000 pair edition that Anjok is maybe gonna release is gonna be immense
think i maybe need to rob a bank for a new pc

Yes I also need another computer but for now I think I'm gonna dedicate my old laptop to just doing these conversions

Anjok · May 14, 2020

ChrisCall said:
Genre specific models fascinate me as well ... what if that one part in a Rock song converts better with a Pop oriented model and can be sliced in with the rest of the song converted with the Rock model to create a complete product? I'm already getting that vibe just comparing this model with the other one. Even failing that, more diverse coverage of quality results is likely.

This is actually something I'm testing now! I had to give my PC a break from training for a bit because I didn't want to burn it out.. I'm almost done building a new one that going to be 100x more powerful. Once I get my remaining parts in the mail, I'm going to start training aggressively with new settings and different batch sizes.

Search

Search

Powerful New Vocal Remover AI - Instructions

chilinvilin

chilinvilin

rkeane

Runner

rkeane

Runner

Anjok

Anjok

halofan253

rkeane

Runner

Anjok

ChrisCall

djtayz

ChrisCall

chilinvilin

rkeane

Runner

chilinvilin

patrik

NewAgeRipper

rkeane

Runner

chilinvilin

Anjok