Survey on Rare Word Techniques in Neural Machine Translation

A comprehensive survey on the techniques used to address the rare word problem in NMT.

April 25th, 2019

During Y3 Fall, I took MATH5471, Statistical Learning Models for Text and Graph Data, with Prof. Yangqiu Song. The course covers a lot of the techniques used in machine learning in NLP, covering both traditional statistical language models to more recent neural sequence models.

For the final project, I conducted a survey on the techniques used to address the rare word problem in Neural Machine Translation.

Modelling rare or unseen words is a difficult task in machine translation, as there is often little or no reference for the model to learn. To account for this issue, there have been many different approaches or techniques that have been introduced to try and account for these rare words. As such, my survey aims to cover a few of the notable or popular techniques to address this issue.

Final Project