VibeVoice is an open-source research framework designed to advance collaboration in the speech synthesis community. We welcome contributions from researchers and developers.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/egarciaf2/VibeVoice/llms.txt
Use this file to discover all available pages before exploring further.
Project Overview
VibeVoice is developed and maintained by Microsoft Research. The project aims to push the boundaries of expressive, long-form, multi-speaker conversational audio generation.VibeVoice is licensed under the MIT License, allowing free use, modification, and distribution with proper attribution.
Getting Started
Repository Access
- GitHub: https://github.com/microsoft/VibeVoice
- Hugging Face: microsoft/vibevoice collection
- Project Page: https://microsoft.github.io/VibeVoice
- Technical Report: arXiv:2508.19205
Installation
Before contributing, set up your development environment:Using Docker (Recommended)
Using Docker (Recommended)
From Source
From Source
Ways to Contribute
Code Contributions
We welcome pull requests for:- Bug fixes and stability improvements
- Performance optimizations
- New features aligned with the project roadmap
- Documentation improvements
- Test coverage expansion
Before starting significant work, please open an issue to discuss your proposed changes with the maintainers.
Research Collaboration
Contribute to the research direction:- Share experimental results and findings
- Propose new architectures or training strategies
- Contribute benchmark evaluations
- Test multilingual capabilities and share observations
Testing and Feedback
Help improve VibeVoice by:- Testing the models in your use cases
- Reporting bugs and unexpected behavior
- Sharing performance metrics on different hardware
- Providing feedback on documentation clarity
- Suggesting new features or improvements
Current Roadmap
Active development areas include:VibeVoice-Realtime Roadmap
VibeVoice-Realtime Roadmap
- Add more voices (expand available speakers/voice timbres)
- Implement streaming text input function to feed new tokens while audio is still being generated
- Merge models into official HuggingFace
transformersrepository
Multilingual Exploration
Multilingual Exploration
Experimental support for nine additional languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) has been added. We welcome:
- Testing and quality evaluations
- Bug reports for specific languages
- Comparative analysis with English performance
- Suggestions for improvement
Submission Guidelines
Opening Issues
When reporting bugs or requesting features:- Check existing issues to avoid duplicates
- Use descriptive titles
- Include:
- Model version and variant
- Hardware configuration
- Steps to reproduce (for bugs)
- Expected vs. actual behavior
- Relevant code snippets or logs
Pull Requests
When submitting code:- Fork the repository
- Create a feature branch
- Make your changes with clear commit messages
- Test thoroughly on your hardware
- Update documentation as needed
- Submit a PR with a detailed description
PR Best Practices
PR Best Practices
- Keep changes focused and atomic
- Follow existing code style and conventions
- Add tests for new functionality
- Update README or docs if behavior changes
- Reference related issues in your PR description
Responsible AI Principles
Contribution Standards
Ensure your contributions:- Do not facilitate deepfakes or disinformation
- Include appropriate safety guardrails
- Maintain or improve content verification capabilities
- Support transparency and AI disclosure
- Respect privacy and consent principles
Voice Customization
To mitigate deepfake risks, voice prompts are provided in an embedded format. Users requiring voice customization should reach out to the team directly.
- Implement authentication and authorization
- Include audit logging capabilities
- Provide clear usage documentation
- Consider consent and verification mechanisms
Community and Support
Getting Help
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For questions and general discussion (if enabled)
- Project Page: microsoft.github.io/VibeVoice for demos and examples
- Colab Demo: Try VibeVoice-Realtime
Sharing Your Work
If you build something with VibeVoice:- Share your project on GitHub with the
vibevoicetopic - Disclose the use of AI-generated content
- Consider contributing improvements back to the project
- Link to the VibeVoice project page for attribution
It is best practice to disclose the use of AI when sharing AI-generated content, in accordance with responsible AI principles.
Security Reporting
Microsoft takes security seriously. For security issues:- Do not use public GitHub issues
- Review guidance at https://aka.ms/SECURITY.md
- Follow Microsoft’s official security reporting procedures
License
VibeVoice is released under the MIT License:By contributing to VibeVoice, you agree that your contributions will be licensed under the MIT License.
Recognition
Contributors are recognized through:- GitHub contributor graphs
- Acknowledgment in release notes (for significant contributions)
- Community recognition in project documentation