Introduction
VariKN language modeling toolkit provides tools for training n-gram language models. Amongst the supported methods are:- Absolute discounting
- Kneser-Ney smoothing
- Revised Kneser pruning
- Kneser-Ney growing
Installation
See the file install.
Commands and Interfaces
The provided commands and interfaces are described in commands.html.
Scientific publications
- Description of algorithms, especially Revised Kneser pruning and Kneser-Ney growing: Vesa Siivola, Teemu Hirsimäki and Sami Virpioja, "On Growing and Pruning Kneser-Ney Smoothed N-Gram Models", IEEE Transactions on Speech, Audio and Language Processing, 15(5):1617-1624, 2007.
- Guidelines on typical training parameters: Vesa Siivola, Mathias Creutz and Mikko Kurimo: "Morfessor and VariKN machine learning tools for speech and language technology", Proceedings of the 8th International Conference on Speech Communication and Technology (INTERSPEECH'07), 2007.