Continuous Broadcast News Acoustic Models

HUB96-HUB97 (or HUB97-HUB98 depending on whom you ask) *

 

 

HUB97 – ? hours of English training data was used from the HUB97 training set.

 

Forced alignment: Before performing forced alignment, Context Independent models were generate by running Baum Welch until convergence. All 100 hours of acoustic training data was used in this pass. After convergence the acoustic models generated by this step were used for forced alignment. 18521 of 19224 utterances from the HUB97 data set were correctly aligned and thus used.

 

3 state vs. 5 state: Two sets of training were performed, 3 state and 5 state. All other variables were held constant, except skipstate, which was set to no for 3 state models, yes for 5 state models. (Forced alignment was not redone so the same utterances that were used to train the 3 state models were used for the 5 state models.)

 

Training Variables:

 

Gaussians:  Context Dependent models were built with 1,2,4,8,16,32 gaussians. This was done to allow for speed/accuracy testing. The HUB97 training set is not large enough to adequately train 32 gaussians, so these models should be considered inadequate. They are still included should someone wish to test them. (See for comments about size of data and number of gaussians in http://www.cs.cmu.edu/~rsingh/sphinxman/FAQ.html.

 

Gausubvq: After the acoustic models were built, gausubvq was run on each set of gaussian models to produce the sub-vector quantized form of the acoustic models. Gaussubvq was called with the following command line arguments:

 Means variances 24,0-11/25,12-23/26,27-38 <num_cluster> 0.0001 1 <filename>

Several runs were performed to iterate over num_clusters, again for testing purposes. Quantized versions should exist for clusters of size: 512, 1024, 2048 and 4096.

 

Results: I will supply these later.

 

 

* I will reference HUB96 and HUB97 because this is mostly the way that CMU  references this data. The difference has to do with the referencing either when the data was collected, or when the data was delivered. I reference collection time.