Deok-Hyeon Cho

I am a Ph.D. candidate in the Department of Artificial Intelligence at Korea University, working in the Pattern Recognition & Machine Learning Lab under the supervision
of Prof. Seong-Whan Lee.

Research Interests:
Text-to-Speech · Conversational Speech Synthesis · Full Duplex Speech-to-Speech
Emotion Recognition · Voice Conversion · Multimodal Affective Modeling

Korea University

Ph.D. Candidate

Sep. 2022 - Present

RexSoft

Developer Intern

Jan. 2022 - Feb.2022

Visang Education

Data Scientist Intern

Jun. 2021 – Jul.2021

Hanyang University ERICA

B.S. in Applied Mathematics

Mar. 2016 – Jul. 2022

News

Jan. 2026 — One paper accepted to ICLR 2026 (ComVo)
Nov. 2025 — Received the Excellence Award at the AI Frontier Challenge
Oct. 2025 — Started collaboration with Murf AI on full duplex speech-to-speech conversational AI
Oct. 2025 — Started collaboration with Thomas Crown on controllable emotion transfer TTS
May 2025 — Three papers accepted to INTERSPEECH 2025 (DiEmo-TTS, EmoSphere-SER, Spotlight-TTS)
Apr. 2025 — One paper published in IEEE Transactions on Affective Computing (EmoSphere++)
Jan. 2025 — One paper published in IEEE Transactions on Affective Computing (DurFlex-EVC)
May. 2024 — Started collaboration with Samsung Research on natural filler speech synthesis
Sep. 2024 — One paper accepted to IEEE SMC 2024 (PromotiCon)
Jun. 2024 — One paper accepted to INTERSPEECH 2024 (EmoSphere-TTS)
Jul. 2022 — Joined the Pattern Recognition & Machine Learning Lab, Korea University
Jan. 2022 — Worked as a Developer Intern at RexSoft
Nov. 2021 — Received the Excellence Award in the Dacon AI Competition
Jun. 2021 — Worked as a Data Scientist Intern at Visang Education
Mar. 2016 — Started the B.S. in Applied Mathematics at Hanyang University ERICA

Research

Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations

D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee

Under Review, 2026

Paper Code Demo

Toward Complex-Valued Neural Networks for Waveform Generation

H.-S. Oh, D.-H. Cho, S.-B. Kim, and S.-W. Lee

Conference of the International Conference on Learning Representations (ICLR), 2026

Paper Code Demo

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech

D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee

Conference of the International Speech Communication Association (INTERSPEECH), 2025

Paper Code Demo

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech

N.-G. Kim, D.-H. Cho, S.-B. Kim, and S.-W. Lee

Conference of the International Speech Communication Association (INTERSPEECH), 2025

Paper Demo

EmoSphere-SER: Enhancing Speech Emotion Recognition through Spherical Representation with Auxiliary Classification

D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee

Conference of the International Speech Communication Association (INTERSPEECH), 2025

Paper Code

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector

D.-H. Cho, H.-S. Oh, S.-B. Kim, and S.-W. Lee

IEEE Transactions on Affective Computing (TAFFC), 2025

Paper Code Demo

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations Without Text Alignment

H.-S. Oh, S.-H. Lee, D.-H. Cho, and S.-W. Lee

IEEE Transactions on Affective Computing (TAFFC), 2025

Paper Code Demo

PromotiCon: Prompt-Based Emotion Controllable Text-to-Speech via Prompt Generation and Matching

J.-E. Lee, S.-B. Kim, D.-H. Cho, and S.-W. Lee

IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2024

Paper Demo

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

D.-H. Cho, H.-S. Oh, S.-B. Kim, S.-H. Lee, and S.-W. Lee

Conference of the International Speech Communication Association (INTERSPEECH), 2024

Paper Code Demo

Projects

Full Duplex Speech-to-Speech Conversational AI

Institution: Murf AI, USA
Duration: Oct. 2025 – Present

Collaborative research on full duplex speech-to-speech conversational AI, targeting more natural and interactive spoken dialogue systems.

Details

Voltron: Cross-speaker Controllable Emotion Transfer TTS

Institution: Thomas Crown, USA
Duration: Sep. 2025 - Oct. 2025

Industry collaboration on controllable cross-speaker emotion transfer for text-to-speech systems, focusing on expressive and transferable emotional style modeling.

Details

Development of Interjection Utterances for Natural Speech Synthesis

Institution: Samsung Research, Korea
Duration: May 2024 – Dec. 2024

Research collaboration on improving the naturalness of speech synthesis through the development of interjection and filler-style utterances.

Details

Awards & Service

Awards

Excellence Award

2025

Extreme-Noise Speech Recognition & Restoration AI Model Development Competition

AI Frontier Challenge — Korea Artificial Intelligence Association (KAIA)

Excellence Award

2021

Credit Card User Delinquency Prediction AI Competition

Hanyang University & Dacon

Academic Service

Reviewer

Served as a reviewer for journals and conferences in affective computing, speech, and machine learning.

IEEE Transactions on Affective Computing (TAFFC)
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
IEEE Signal Processing Letters (SPL)
IEEE International Conference on Systems, Man, and Cybernetics (SMC)
International Conference of the International Speech Communication Association (INTERSPEECH)

Patents

KR patents

Method for Cross-Speaker Emotion Transfer in Text-to-Speech Using Disentangled Emotion Representations via Self-Supervised Distillation

10-2025-0130127

Method and System for Expressive Text-to-Speech via Voiced-Aware Style Extraction and Style Direction Adjustment

10-2025-0116457

Apparatus and Method for Speech Synthesis

10-2025-0088028

Method, Device, and Program for Synthesizing Voices Expressing Emotions Based on Prompts

10-2024-0099370

Emotional Expression Voice Generation Apparatus and Method Capable of Controlling Emotional Style and Intensity Using Continuous Emotional Dimensions

10-2024-0029066