Skills your agents can run
Plug-and-play capabilities — vetted, versioned, and runnable from any Mighty agent.
High-quality voice synthesis with 9 personas, 11 languages, streaming, and voice cloning using Voice.ai API.
This skill should be used whenever users need YouTube video scripts written. On first use, collects comprehensive preferences including script type, tone, target audience, style, video length, hook style, use of humor, personality, and storytelling approach. Generates complete, production-ready YouTube scripts tailored to user's specifications for any topic. Maintains database of preferences and past scripts for consistent style.
Use this skill to create podcast episodes, interviews, dialogues, and conversation-style audio. Triggers: "create podcast", "podcast episode", "interview audio", "dialogue", "conversation", "two hosts discussing", "audio show", "radio show", "audio drama" Orchestrates: script generation, multi-speaker TTS, intro/outro music, and audio assembly.
Terminal-based Spotify playback and search via spogo (preferred) or spotify_player.
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use this when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with the Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from a React frontend to a Python FastAPI backend with WebSocket streaming.
解析视频分享链接,获取无水印视频下载地址。当用户想要下载视频、解析抖音/快手/小红书/B站链接、获取无水印视频时使用此 skill。
High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
Analyze any video source with combined audio transcription and visual frame analysis. Supports YouTube URLs, local file uploads, live camera streams, RTSP/HTTP streams, and screen recordings.
Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".
Writes or reviews lyrics with professional prosody, rhyme craft, and quality checks. Use when writing new lyrics, revising existing lyrics, or when the user says "let's work on a track."
Text-to-speech via Inworld.ai API. Use when generating voice audio from text, creating spoken responses, or converting text to MP3/audio files. Supports multiple voices, speaking rates, and streaming for long text.
获取关注的 YouTube 博主/播客的最新更新。列出最近两天的新视频,包含标题、发布时间和简要描述。 触发词:"获取播客更新"、"最近有什么新播客"、"YouTube 更新"、"播客更新"、"有什么新的播客"。 用于每日检查关注的 AI/科技播客是否有新内容更新。
Download YouTube videos in various formats and qualities. Use when you need to save videos for offline viewing, extract audio, download playlists, or get specific video formats.
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
Download YouTube videos in various formats and qualities. Use when you need to save videos for offline viewing, extract audio, download playlists, or get specific video formats.
Transform your OpenClaw workspace into personalized AI-powered podcast briefings. Get daily audio updates on your work, priorities, and strategy in 8 compelling styles—from documentary narrator to strategic advisor. Connects directly to your agent's memory and files. Includes 3 free hours of podcast generation with your Superlore.ai API key. Schedule morning, midday, or evening briefings that keep you informed without reading screens.
Text-to-Speech service using Doubao (Volcano Engine) API with 200+ voices, interactive voice selection, and multilingual support
Create comprehensive audio design including music cues, sound effects, Foley, and score direction
Display, understand, and act on video and audio. Display: ingest content from local files, URLs, RTSP/live streams, or real-time desktop recordings and return real-time context and playable stream links. Understand: extract frames, build visual, semantic, and temporal indexes, and search moments with timestamps and automatic clips. Act: perform transcoding and normalization (codec, frame rate, resolution, aspect ratio), timeline editing (subtitles, text/image overlays, branding, audio overlays, voiceover, translation), media asset generation (images, audio, video), and real-time alerts for live-streamed or desktop-captured events.