Skip to content

Kouon-Project/DSRX

Repository files navigation

DiffSinger (Kouon Project forked from OpenVPI maintained 2024-11 ver.)

arXiv license

This is a refactored and enhanced version of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism based on the original paper and implementation, which provides:

  • Cleaner code structure: useless and redundant files are removed and the others are re-organized.
  • Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
  • Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
  • More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
  • Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.
Overview Variance Model Acoustic Model
arch-overview arch-variance arch-acoustic

User Guidance

Still Working...

Architecture & Algorithms

TBD

Development Resources

TBD

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

In this fork:

  • LoRA for LoRA-finetuning

Disclaimer

Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

License

This forked DiffSinger repository is licensed under the Apache 2.0 License.

About

A DiffSinger fork by Kouon Project, improved from the openvpi fork (November 2024 version).

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages