截至目前最好用的語音轉文字方案依然是aadnk/faster-whisper-webui

地址：https://huggingface.co/spaces/aadnk/faster-whisper-webui

對比了幾個更新一點的方案，有的增加了隊列和資源可視化，但是效率比較低，尤其是沒有優化顯存佔用，比如https://github.com/jhj0517/Whisper-WebUI，使用v3模型時需要25G顯存，我的3090還跑不起來了，而且並不支持詞級標註。

爲了更方便的使用aadnk/faster-whisper-webui，可以在其文件目錄中增加一個docker-compose.yml

services:
  faster-whisper:
    image: registry.gitlab.com/aadnk/whisper-webui:latest
    container_name: faster-whisper
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    ports:
      - "7001:7860"
    volumes:
      - ./cache/whisper:/root/.cache/whisper
      - ./cache/huggingface:/root/.cache/huggingface
    restart: on-failure:15
    environment:
      - HTTP_PROXY=http://192.168.1.200:16236
      - HTTPS_PROXY=http://192.168.1.200:16236
      - NO_PROXY=localhost,127.0.0.1,::1
    command: >
      app.py
      --whisper_implementation whisper
      --input_audio_max_duration -1
      --server_name 0.0.0.0
      --auto_parallel True
      --default_vad silero-vad
      --default_model_name large-v2

注意其中的large-v2也可以改爲large-v3，最新版已經支持。另外，HTTP_PROXY相關設置是爲了啓動時通過代理訪問gradio，否則會啓動失敗，通過environment來設置代理的方式比較靈活，不會影響其他docker容器，詳細情況參加我前面一篇博客。

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

发表回复 取消回复

发表回复取消回复