Deok-Hyeon Cho

I am a Ph.D. candidate in the Department of Artificial Intelligence at Korea University, working in the Pattern Recognition & Machine Learning Lab under the supervision
of Prof. Seong-Whan Lee.

Research Interests:
Text-to-Speech · Conversational Speech Synthesis · Full Duplex Speech-to-Speech
Emotion Recognition · Voice Conversion · Multimodal Affective Modeling

Korea University

Korea University

Ph.D. Candidate

Sep. 2022 - Present
RexSoft

RexSoft

Developer Intern

Jan. 2022 - Feb.2022
Visang Education

Visang Education

Data Scientist Intern

Jun. 2021 – Jul.2021
Hanyang University ERICA

Hanyang University ERICA

B.S. in Applied Mathematics

Mar. 2016 – Jul. 2022

News

  • Jan. 2026 — One paper accepted to ICLR 2026 (ComVo)
  • Nov. 2025 — Received the Excellence Award at the AI Frontier Challenge
  • Oct. 2025 — Started collaboration with Murf AI on full duplex speech-to-speech conversational AI
  • Oct. 2025 — Started collaboration with Thomas Crown on controllable emotion transfer TTS
  • May 2025 — Three papers accepted to INTERSPEECH 2025 (DiEmo-TTS, EmoSphere-SER, Spotlight-TTS)
  • Apr. 2025 — One paper published in IEEE Transactions on Affective Computing (EmoSphere++)
  • Jan. 2025 — One paper published in IEEE Transactions on Affective Computing (DurFlex-EVC)
  • May. 2024 — Started collaboration with Samsung Research on natural filler speech synthesis
  • Sep. 2024 — One paper accepted to IEEE SMC 2024 (PromotiCon)
  • Jun. 2024 — One paper accepted to INTERSPEECH 2024 (EmoSphere-TTS)
  • Jul. 2022 — Joined the Pattern Recognition & Machine Learning Lab, Korea University
  • Jan. 2022 — Worked as a Developer Intern at RexSoft
  • Nov. 2021 — Received the Excellence Award in the Dacon AI Competition
  • Jun. 2021 — Worked as a Data Scientist Intern at Visang Education
  • Mar. 2016 — Started the B.S. in Applied Mathematics at Hanyang University ERICA

Research

Affectron
Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee
Under Review, 2026
ComVo
Toward Complex-Valued Neural Networks for Waveform Generation
H.-S. Oh, D.-H. Cho, S.-B. Kim, and S.-W. Lee
Conference of the International Conference on Learning Representations (ICLR), 2026
DiEmo-TTS
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee
Conference of the International Speech Communication Association (INTERSPEECH), 2025
Spotlight-TTS
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
N.-G. Kim, D.-H. Cho, S.-B. Kim, and S.-W. Lee
Conference of the International Speech Communication Association (INTERSPEECH), 2025
EmoSphere-SER
EmoSphere-SER: Enhancing Speech Emotion Recognition through Spherical Representation with Auxiliary Classification
D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee
Conference of the International Speech Communication Association (INTERSPEECH), 2025
EmoSphere++
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee
IEEE Transactions on Affective Computing (TAFFC), 2025
DurFlex-EVC
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment
H.-S. Oh, S.-H. Lee, D.-H. Cho, and S.-W. Lee
IEEE Transactions on Affective Computing (TAFFC), 2025
PromotiCon
PromotiCon: Prompt-Based Emotion Controllable Text-to-Speech via Prompt Generation and Matching
J.-E. Lee, S.-B. Kim, D.-H. Cho, and S.-W. Lee
IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2024
EmoSphere-TTS
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
D.-H. Cho, H.-S. Oh, S.-B. Kim, S.-H. Lee, and S.-W. Lee
Conference of the International Speech Communication Association (INTERSPEECH), 2024

Projects

Murf AI Project
Full Duplex Speech-to-Speech Conversational AI
Institution: Murf AI, USA
Duration: Oct. 2025 – Present
Collaborative research on full duplex speech-to-speech conversational AI, targeting more natural and interactive spoken dialogue systems.
Voltron Project
Voltron: Cross-speaker Controllable Emotion Transfer TTS
Institution: Thomas Crown, USA
Duration: Sep. 2025 - Oct. 2025
Industry collaboration on controllable cross-speaker emotion transfer for text-to-speech systems, focusing on expressive and transferable emotional style modeling.
Samsung Research Project
Development of Interjection Utterances for Natural Speech Synthesis
Institution: Samsung Research, Korea
Duration: May 2024 – Dec. 2024
Research collaboration on improving the naturalness of speech synthesis through the development of interjection and filler-style utterances.

Awards & Service

Awards

Excellence Award
2025
Extreme-Noise Speech Recognition & Restoration AI Model Development Competition
AI Frontier Challenge — Korea Artificial Intelligence Association (KAIA)
Excellence Award
2021
Credit Card User Delinquency Prediction AI Competition
Hanyang University & Dacon

Academic Service

Reviewer
Served as a reviewer for journals and conferences in affective computing, speech, and machine learning.
IEEE Transactions on Affective Computing (TAFFC)
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
IEEE Signal Processing Letters (SPL)
IEEE International Conference on Systems, Man, and Cybernetics (SMC)
International Conference of the International Speech Communication Association (INTERSPEECH)

Patents

KR patents
Method for Cross-Speaker Emotion Transfer in Text-to-Speech Using Disentangled Emotion Representations via Self-Supervised Distillation
10-2025-0130127
Method and System for Expressive Text-to-Speech via Voiced-Aware Style Extraction and Style Direction Adjustment
10-2025-0116457
Apparatus and Method for Speech Synthesis
10-2025-0088028
Method, Device, and Program for Synthesizing Voices Expressing Emotions Based on Prompts
10-2024-0099370
Emotional Expression Voice Generation Apparatus and Method Capable of Controlling Emotional Style and Intensity Using Continuous Emotional Dimensions
10-2024-0029066