Pohang University of Science and Technology
Research Scientist at Yahoo
Join LinkedIn & access SUNGJIN's full profile
My research interests lie in various areas of speech and language processing as well as machine learning. I am currently a postdoc research fellow in Language Technologies Institute at Carnegie Mellon University. I am primarily working on statistical dialog modeling including dialog state tracking and dialog strategy learning. I have developed dialog state tracking systems which achieved state-of-the-art performance on average in Dialog State Tracking Challenge 2013. I have also developed a rapid sparse Bayesian reinforcement learning algorithm for online dialog strategy optimization through interactions with real users. I am also interested in applying spoken language technologies to computer-assisted language learning settings.
Dialog State Tracking Challenge (DSTC), Advisory Committee
The REAL Challenge, Scientific Committee
SIGDIAL, Program Committee
ACL, Program Committee
Interspeech, Program Committee
IJCNLP, Program Committee
Working on statistical dialog modeling
PIOLINK is a leading application networking company in Korea. It manufactures Application Switches, Network Load Balancers, and Web Security Switches. I worked as a senior team member for the development and design of multi processors and multitasking systems. I implemented and managed firmware and software for embedded Linux systems mixing PowerPC and MIPS. I wrote numerous device drivers including communications processor drivers (BroadCom BCM1250, BCM1480), SSL chip drivers (Britestream BN1010; Cavium CN1000), network chip drivers (BroadCom BCM5690, BCM5464SR), and peripheral device drivers (PCI, HT, System Monitoring chip, RTC & NVRAM, Flash memory, UART).
CORECESS manufactures telecommunications equipment for broadband access networks such as Optical Link Technologies (GEPON and WDM PON), Intelligent Mutilayer Switches, VDSL, and DSLAM. I worked on the development and design of quality of service (Corecess QoS architecture design and implementation), layer 2 network protocols (RSTP, LACP), and a Logging File System. I implemented and managed system software for embedded Linux and pSOS. I wrote numerous device drivers including programmable network processor drivers (Agere APP550), network chip drivers (Switchcore CXE 1000, CXE16; Galileo Galnet II, Galnet II+, Galnet III), and peripheral device drivers (PCI, IIC, System Monitoring chip, RTC & NVRAM, Flash memory, UART).
Incremental Dialog Processing (IDP) enables Spoken Dialog Systems to gradually process minimal units of user speech in order to give the user an early system response. In this paper, we present an application of IDP that shows its effectiveness in a task-oriented dialog system. We have implemented an IDP strategy and deployed it for one month on a real-user system. We compared the resulting dialogs with dialogs produced over the previous month without IDP. Results show that the incremental strategy significantly improved system performance by eliminating long and often off-task utterances that generally produce poor speech recognition results. User behavior is also affected; the user tends to shorten utterances after being interrupted by the system.
During the recent Dialog State Tracking Challenge (DSTC), a fundamental question was raised: “Would better performance in dialog state tracking translate to better performance of the optimized policy by reinforcement learning?” Also, during the challenge system evaluation, another non-trivial question arose: “Which evaluation metric and schedule would best predict improvement in overall dialog performance?” This paper aims to answer these questions by applying an off-policy reinforcement learning method to the output of each challenge system. The results give a positive answer to the first question. Thus the effort to separately improve the performance of dialog state tracking as carried out in the DSTC may be justified. The answer to the second question also draws several insightful conclusions on the characteristics of different evaluation metrics and schedules.
This study examines the dialog-based language learning game (DB-LLG) realized in a 3D environment built with game contents. We designed the DB-LLG to communicate with users who can conduct interactive conversations with game characters in various immersive environments. From the pilot test, we found that several technologies were identified as essential in the construction of the DB-LLG such as dialog management, hint generation, and grammar error detection and feedback. We describe the technical details of our system POSTECH immersive English study (Pomy). We evaluated the performance of each technology using a simulator and by field tests with users.
Many dialog state tracking algorithms have been limited to generative modeling due to the influence of the Partially Observable Markov Decision Process framework. Recent analyses, however, raised fundamental questions on the effectiveness of the generative formulation. In this paper, we present a structured discriminative model for dialog state tracking as an alternative. Unlike generative models, the proposed method affords the incorporation of features without having to consider dependencies between observations. It also provides a flexible mechanism for imposing relational constraints. To verify the effectiveness of the proposed method, we applied it to the Let’s Go domain (Raux et al., 2005). The results show that the proposed model is superior to the baseline and generative model-based systems in accuracy, discrimination, and robustness to mismatches between training and test datasets.
For robust spoken conversational interaction, many dialog state tracking algorithms have been developed. Few studies, however, have reported the strengths and weaknesses of each method. The Dialog State Tracking Challenge (DSTC) is designed to address this issue by comparing various methods on the same domain. In this paper, we present a set of techniques that build a robust dialog state tracker with high performance: wide-coverage and well-calibrated data selection, feature-rich discriminative model design, generalization improvement techniques and unsupervised prior adaptation. The DSTC results show that the proposed method is superior to other systems on average on both the development and test datasets.
This paper proposes an incremental sparse Bayesian learning method to allow continuous dialog strategy learning from the interactions with real users. Since conventional reinforcement learning (RL) methods require a huge number of dialogs to reach convergence, it has been essential to use a simulated user in training dialog policies. The disadvantage of this approach is that the trained dialog policies always lag behind the optimal one for live users. In order to tackle this problem, a few studies applying online RL methods to dialog management have emerged and showed very promising results. However, these methods are limited to learning online the weight parameters of the basis functions in the model and so need batch learning on a fixed data set or some heuristics to find appropriate values for other meta parameters such as sparsity-controlling thresholds, basis function parameters, and noise parameters. The proposed method attempts to overcome this limitation to achieve fully incremental and fast dialog strategy learning by adopting a sparse Bayesian learning method for value function approximation. In order to verify the proposed method, three different experimental conditions have been used: artificial data, a simulated user, and real users. The experiment on the artificial data showed that the proposed method successfully learns all the parameters in an incremental manner. Also, the experiment on training and evaluating dialog policies with a simulated user clearly demonstrated that the proposed method is much faster than conventional RL methods. A live user study showed that the dialog strategy learned from real users performed as good as the best past systems, although it slightly underperformed the one trained on simulated dialogs due to the difficulty of user feedback elicitation.
This paper describes a POMDP-based Let’s Go system which incorporates belief tracking and dialog policy optimization into the dialog manager of the reference system for the Spoken Dialog Challenge (SDC). Since all components except for the dialog manager were kept the same, component-wise comparison can be performed to investigate the effect of belief tracking and dialog policy optimization on the overall system performance. In addition, since unsupervised methods have been adopted to learn all required models to reduce human labor and development time, the effectiveness of the unsupervised approaches compared to conventional supervised approaches can be investigated. The result system participated in the 2011 SDC and showed comparable performance with the base system which has been enhanced from the reference system for the 2010 SDC. This shows the capability of the proposed method to rapidly produce an effective system with minimal human labor and experts’ knowledge.
This paper proposes an unsupervised approach to user simulation in order to automatically furnish updates and assessments of a deployed spoken dialog system. The proposed method adopts a dynamic Bayesian network to infer the unobservable true user action from which the parameters of other components are naturally derived. To verify the quality of the simulation, the proposed method was applied to the Let’s Go domain (Raux et al., 2005) and a set of measures was used to analyze the simulated data at several levels. The results showed a very close correspondence between the real and simulated data, implying that it is possible to create a realistic user simulator that does not necessitate human intervention.
This paper proposes the use of unsupervised approaches to improve components of partition-based belief tracking systems. The proposed method adopts a dynamic Bayesian network to learn the user action model directly from a machine-transcribed dialog corpus. It also addresses conﬁdence score calibration to improve the observation model in a unsupervised manner using dialog-level grounding information. To verify the effectiveness of the proposed method, we applied it to the Let’s Go domain (Raux et al., 2005). Overall system performance for several comparative models were measured. The results show that the proposed method can learn an effective user action model without human intervention. In addition, the calibrated conﬁdence score was veriﬁed by demonstrating the positive inﬂuence on the user action model learning process and on overall system performance.
The demand for computer-assisted language learning systems that can provide corrective feedback on language learners’ speaking has increased. However, it is not a trivial task to detect grammatical errors in oral conversations because of the unavoidable errors of automatic speech recognition systems. To provide corrective feedback, a novel method to detect grammatical errors in speaking performance is proposed. The proposed method consists of two sub-models: the grammaticality-checking model and the error-type classification model. We automatically generate grammatical errors that learners are likely to commit and construct error patterns based on the articulated errors.
When a particular speech pattern is recognized, the grammaticality-checking model performs a binary classification based on the similarity between the error patterns and the recognition result using the confidence score. The error-type classification model chooses the error type based on the most similar error pattern and the error frequency extracted from a learner corpus. The grammaticality-checking method largely outperformed the two comparative models by 56.36% and 42.61% in F-score while keeping the false positive rate very low. The error type classification model exhibited very high performance with a 99.6% accuracy rate. Because high precision and a low false positive rate are important criteria for the language-tutoring setting, the proposed method will be helpful for intelligent computer-assisted language learning systems.
This study introduces the educational assistant robots that we developed for foreign language learning and explores the effectiveness of robot-assisted language learning (RALL) which is in its early stages. To achieve this purpose, a course was designed in which students have meaningful interactions with intelligent robots in an immersive environment. A total of 24 elementary students, ranging in age from ten to twelve, were enrolled in English lessons. A pre-test/post-test design was used to investigate the cognitive effects of the RALL approach on the students’ oral skills. No significant difference in the listening skill was found, but the speaking skills improved with a large effect size at the significance level of 0.01. Descriptive statistics and the pre-test/post-test design were used to investigate the affective effects of RALL approach. The result showed that RALL promoted and improved students’ satisfaction, interest, confidence, and motivation at the significance level of 0.01.
This paper presents an automated method to generate realistic grammatical errors that can perform crucial functions for advanced technologies in computer-assisted language learning (CALL), including generating corrective feedback in dialog-based CALL (DB-CALL) systems, simulating a language learner to optimize tutoring strategies, and generating context-dependent grammar quizzes as educational materials. The goal of this study is to make grammatical errors generated by automatic simulators more realistic. To generate realistic errors, expert knowledge of language learners’ error characteristics was imported into a statistical modeling system that uses Markov logic, which provides a theoretically sound way to encode knowledge into probabilistic first-order logic. We learned the weights of first-order formulas from a learner corpus. The improved quality of the proposed method was demonstrated through comparative experiments using automatic evaluations (precision and recall rate and Kullback–Leibler divergence between error distributions) and human assessments. The proposed method increased precision by 6% and recall by 8.33% averaged across all proficiency levels. It also exhibited a relative improvement of 37.5% in the average Kullback–Leibler divergence. Judgment by human evaluators showed that the proposed method increased the average scores in two different evaluation tasks by 7 and by 0.411. Finally, we present a measure of labor savings to help predict the time and cost associated with this method for those who plan to exploit grammatical error simulation for their applications. The results indicate that using the proposed method could reduce the grammatical error generation time by 59% in average.
I participated in making the project proposal and drawing the system architecture. In particular, I am developing a component for grammaticality judgment and provision of feedback for oral output.
I was responsible for project management and participated in designing the system architecture consisting of various technologies such as Speech, Vision, and Haptic. I am developing dialog strategies in consideration of students’ proficiency level, emotion, and gameplay.
I was responsible for investigating dialog management for English tutoring. I am working on data collection, robust language understanding, dialog management, and corrective feedback generation. I am researching a ASR combination to generate feedback on both global and local errors.
I was responsible for speech and language processing of intelligent robots. I collaborated with English teachers to make educational material and developed communicative robots capable of providing recast feedback in response to students’ errors. I participated in a pilot project for elementary students.
I was responsible for project management. I designed and implemented a hybrid language understanding component for robust language understanding and corrective feedback generation
I was responsible for project management. I designed and implemented a spoken dialog system for English conversation practice in the immigration domain. I developed a method to provide recast feedback and suggest expressions in the case of timeout
Chief Research Engineer at LGE (Smart Car Lab.)
Postdoctoral Associate at Massachusetts Institute of Technology (MIT)
Greater Boston Area
UI/UX Designer at WatrHub Inc.
Vancouver, Canada Area
Principal Engineer at Samsung Electronics
postdoc at Stanford University
San Francisco Bay Area
Scientist at Institute for Infocomm Research
PhD Candidate at Carnegie Mellon University
Software Engineer at Google
PhD candidate at CMU School of Computer Science
General Manager at NHN
Post-Doc. at Computer Science, Carnegie Mellon University
Pohang University of Science and Technology 회사원
Visiting Foreign Lecturer at POSTECH
Assistant Professor at KAIST