RVC WebUIの推論をコマンドラインで動かす

背景
結論
前提
方法
リファレンス

背景

RVC WebUIのmodel inferenceを使うと、音声ファイルに対して声質の変換(ボイスチェンジ)ができます。

GitHub - RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data

EasilytrainagoodVCmodelwithvoicedata

この変換を、WebUI(ブラウザ)を起動せずにコマンドラインで動かしたかったので調べました。

結論

tools/infer_cli.py を使います

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/tools/infer_cli.py

実行例

python tools/infer_cli.py --input_path in.wav  --model_name your_voice_model.pth  --device cuda:0  --opt_path out.wav --index_path None

以上です。これ以降はこのスクリプトに気づく前に書いた内容です。

前提

環境: Ubuntu 20.04
RVC WebUIのコミットID 08b22f036eb779673352cbb7250fc93e373cea26 (2023/10/11)を対象に説明します。バージョンが変わるとすぐ動かなくなると思います
RVC WebUIの動作に必要なパッケージやモデルの配置は済んでいるものとします。

方法

以下のようなcli.py とcli.sh をRVC WebUIのルートディレクトリRetrieval-based-Voice-Conversion-WebUI 内に作成します。

import argparse
import sys

from dotenv import load_dotenv
import soundfile

from configs.config import Config
from infer.modules.vc.modules import VC

parser = argparse.ArgumentParser(
    prog="Retrieval-Based Voice Conversion",
    description="A Voice Conversion framework based on VITS",
)
parser.add_argument("-v", "--voice", type=str, required=True)
parser.add_argument("-i", "--input_filepath", type=str, required=True)
parser.add_argument("-o", "--output_filepath", type=str, required=True)
parser.add_argument("--sid", type=int, default=0)
parser.add_argument("--transpose", type=int, default=0)
parser.add_argument("--f0_filepath", type=str, default=None)
parser.add_argument("--f0_method", type=str, default="harvest")
parser.add_argument("--index_filepath", type=str, default="")
parser.add_argument("--index_ratio", type=float, default=1)
parser.add_argument("--filter_radius", type=int, default=3)
parser.add_argument("--resample_rate", type=int, default=0)
parser.add_argument("--rms_mix_ratio", type=float, default=1)
parser.add_argument("--protect", type=float, default=0.33)
args = parser.parse_args()
sys.argv = [""]

load_dotenv()
config = Config()

vc = VC(config)
vc.get_vc(
    args.voice,
    None,
    None,
)

# Perform inference
_, (output_samplerate, audio_output) = vc.vc_single(
    args.sid,
    args.input_filepath,
    args.transpose,
    args.f0_filepath,
    args.f0_method,
    args.index_filepath,
    "",
    args.index_ratio,
    args.filter_radius,
    args.resample_rate,
    args.rms_mix_ratio,
    args.protect,
)

# Write the output file
soundfile.write(
    args.output_filepath,
    audio_output,
    output_samplerate,
    format="FLAC",
)

# !/bin/bash

path_in=in.wav
path_out=out.wav
path_model=your_voice_model.pth # モデルのpathは assets/weights/ 内にあるファイルの名前を指定する

python cli.py \
    --voice ${path_model} \
    --input_filepath ${path_in} \
    --output_filepath ${path_out}

cli.sh 内の各ファイルパスを編集してから実行します。

./cli.sh

リファレンス

rvc_command_line

本文に掲載したコードは、以下のリポジトリを参考に作成しました

https://github.com/hydrusbeta/rvc_command_line/tree/main
MITライセンス

RVC WebUIのリポジトリ

推論方法はRVC WebUIのwikiに書いてありますが、コードが古いので今のバージョンでは動かないようでした

Q7:How to train and infer without the WebUI?

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/FAQ-(Frequently-Asked-Questions)#q7how-to-train-and-infer-without-the-webui

上記コードがメンテされていないことに関するissue

myinfer.py has not been updated #299

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/299

RVC Eval Simplified

ファイルではなくリアルタイム音声変換に関しては以下のリポジトリがコマンドライン対応していました。ファイル入出力するように変更を試みましたが、今回試した範囲ではうまく変換できませんでした

https://github.com/esnya/rvc-eval
MITライセンス