VariKN

A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning methods.

Download .zip Download .tar.gz View on GitHub

Introduction

VariKN language modeling toolkit provides tools for training n-gram language models. Amongst the supported methods are:
  • Absolute discounting
  • Kneser-Ney smoothing
  • Revised Kneser pruning
  • Kneser-Ney growing
The package provides accurate pruning for Kneser-Ney smoothed models. Also, it is possible to train a very high-order n-gram models with the growing algorithm. The models can be output to arpa lm format, which is compatible with most common other tools in the field.

Installation

See the file install.

Commands and Interfaces

The provided commands and interfaces are described in commands.html.

Scientific publications

Links to other interesting github projects

  • Aalto ASR speech recognition system handles long n-gram contexts gracefully
  • Morfessor provides an unsupervised method for producing morpheme-like sub-word units