VoiceMode MCP — 安裝 & 即時演示

為什麼要用

核心特性

本機 Whisper 選項——不上傳語音到雲端
多種 TTS 後端：OpenAI、ElevenLabs、本機 Coqui
按鍵說話或語音啟動模式
串流部分回應，讓你「聽到 Claude 在思考」
在終端機中與 Claude Code CLI 並行運作

即時演示

實際使用效果

voicemode-mcp.replay ▶ 就緒

0/0

安裝

選擇你的客戶端

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "voicemode-mcp": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

開啟 Claude Desktop → Settings → Developer → Edit Config。儲存後重啟應用。

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "voicemode-mcp": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

Cursor 使用與 Claude Desktop 相同的 mcpServers 格式。專案級設定優先於全域。

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "voicemode-mcp": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

點擊 Cline 側欄中的 MCP Servers 圖示，然後選 "Edit Configuration"。

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "voicemode-mcp": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

格式與 Claude Desktop 相同。重啟 Windsurf 生效。

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "voicemode-mcp",
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  ]
}

Continue 使用伺服器物件陣列，而非映射。

~/.config/zed/settings.json

{
  "context_servers": {
    "voicemode-mcp": {
      "command": {
        "path": "uvx",
        "args": [
          "voice-mode"
        ]
      }
    }
  }
}

加入 context_servers。Zed 儲存後熱重載。

claude mcp add voicemode-mcp -- uvx voice-mode

一行命令搞定。用 claude mcp list 驗證，claude mcp remove 移除。

使用場景

實戰用法： VoiceMode

在另一個螢幕上看文件的同時，解放雙手驅動 Claude Code session

👤 在一個螢幕看文件或設計稿、另一個螢幕寫代碼的開發者 ⏱ ~30 min intermediate

何時使用： 你正在看設計文件，想用口述的方式提出修改，而不需要切換視窗。

前置條件

麥克風與喇叭 — 確認系統音訊正常——用 say "hello" 或類似指令測試
Whisper 模型已就緒 — voice-mode install-whisper 下載本機模型

步驟

啟動語音

Use voicemode. Listen for prompts and speak responses. Repeat after me: "ready"✓ 已複製

→ TTS 播放「ready」
口述修改

[spoken] Update src/auth.ts — use bcrypt instead of plain SHA256 for passwords.✓ 已複製

→ 語音辨識正確；修改已套用；TTS 確認完成
審閱

[spoken] Read me the diff.✓ 已複製

→ TTS 分段朗讀 diff，可暫停

結果： 一個雙手從未離開原本工作的完整 session。

注意事項

TTS 在你說話時搶話 — 啟用按鍵說話模式或設定喚醒詞

搭配使用： filesystem

以語音寫代碼——適用於無障礙需求或 RSI 復健期

👤 有 RSI、低視力或偏好語音輸入的開發者 ⏱ ~60 min intermediate

何時使用： 你暫時無法打字，但需要持續產出。

前置條件

可接受的環境噪音 — 安靜的房間；耳機麥克風比筆電麥克風效果更好

步驟

基準測試

[spoken] Use voicemode. Read the latest git diff out loud, pausing between files.✓ 已複製

→ 清晰的 TTS 朗讀
工作流程

[spoken] Refactor the user model in src/models/user.ts. Move password hashing into a method. Show me the plan first.✓ 已複製

→ 計畫朗讀；需要確認後才執行修改

結果： 完整的程式設計 session，全程無需鍵盤輸入。

注意事項

TTS 念代碼符號時發音錯誤 — 為常見的程式設計術語設定 TTS 音素字典

組合

與其他 MCP 搭配，撬動十倍槓桿

voicemode-mcp + filesystem

以語音口述的代碼修改直接套用到儲存庫

I'll dictate changes; apply them in files after reading each back.✓ 已複製

voicemode-mcp + github

語音審閱 diff 後，口述 PR 描述

Read me the staged changes, then open a PR with a description I'll dictate.✓ 已複製

工具

此 MCP 暴露的能力

工具	輸入參數	何時呼叫	成本
start_listening	mode: "ptt"\|"vad"	開始語音 session	free or OpenAI Whisper API
speak	text: str, voice?: str	Claude 需要以語音輸出任何內容時	TTS provider-dependent
transcribe_last	none	取得使用者剛才說的話	Whisper call
stop_listening	none	結束語音 session	free

成本與限制

運行它的成本

API 配額: 本機：免費。OpenAI Whisper：$0.006/分鐘。ElevenLabs TTS：約 $0.30/1k 字元。
每次呼叫 Token 數: 音訊管線不直接計算 token
費用: 本機方案免費；雲端提供商按用量計費
提示: 本機 Whisper + Coqui TTS 完全免費但品質較低——先用雲端，之後再降級

安全

權限、密鑰、影響範圍

最小權限： microphone speakers

憑證儲存： TTS／STT API 金鑰存放在環境變數

資料出站： 若非本機，語音音訊會傳送至 TTS／STT 提供商

包含機密音訊的通話請勿使用雲端 STT，除非你信任該提供商的資料保留政策

故障排查

常見錯誤與修復

Mic not detected

系統音訊權限——授予終端機或 Claude Code 麥克風存取權

驗證： `voice-mode test-mic` prints levels

TTS sounds robotic

預設為本機 Coqui——透過 VOICE_MODE_TTS=openai 切換至 OpenAI tts-1-hd

Lag between my speech and response

STT 改用本機 Whisper-tiny；雲端方案會增加 500ms 以上的延遲

替代方案

VoiceMode 對比其他方案

替代方案	何時用它替代	權衡
macOS Dictation + say command	你只需要基本的作業系統層級語音功能	無法與 Claude 的輸出整合——只有單向
Superwhisper / Wispr Flow	你需要精緻的原生 macOS 語音輸入 App	未與 MCP 整合；無法進行 agent 層級的工作流程

VoiceMode

為什麼要用

核心特性

即時演示

實際使用效果

安裝

選擇你的客戶端

使用場景

實戰用法： VoiceMode

在另一個螢幕上看文件的同時，解放雙手驅動 Claude Code session

前置條件

步驟

注意事項

以語音寫代碼——適用於無障礙需求或 RSI 復健期

前置條件

步驟

注意事項

組合

與其他 MCP 搭配，撬動十倍槓桿

工具

此 MCP 暴露的能力

成本與限制

運行它的成本

安全

權限、密鑰、影響範圍

故障排查

常見錯誤與修復

替代方案

VoiceMode 對比其他方案

更多

資源