D:/SPHINX:

 

Under this directory lives most of the files on which I am working. The latest versions of everything are under here. This is also the location of most of my testing/evaluation. It also contains the acoustic and language models that I frequently use. It is designed to encompass everything.

 

 

S3.3:

This is the current working SPHINX III. 3.3 code base. It contains a front end that allows for stream processing. However 3.3 does not split nor parse the incoming audio data. It assumes it has been segmented already.  Various models under S3.3 are designed to deal with segmentation. (Note at this point there is not a WIN32 tool that will create a .ctl file, all segmentation performed at this time is done live.)

* Src/Decode_audiofile:

This is the current working directory for decoding audio files. As of 2/6/2001 it contains the most recent versions of everything. (This code base was developed from Seg_ad.)

This code decodes a raw audio file (ideally extracted from a media file via ExtractAudio). It performs segmentation and melcep computation in a live fashion. It is a single pass decode on the raw audio file.

It takes as input a SPHINX III argument file, the raw audio filename and the output filename.

It produces an output file consisting of each word recognized, its start and end frame (in speech frames, 10ms/frame), its acoustic score, its language model score and its type.

See Perl/CreateTRS.pl for a script that generates a format compatible with Claude Barras (DGA) and LDC’s Transcriber program.

This file S3.html contains information regarding the parameters for the arguments file needed for SPHINX 3.

* Src/decodeaudiofile_broken:

Contains working files on the first version of decoding a raw audio file. It is based on cont_seg. There is a bug in the distributed version of cont_seg that doesn’t work correctly with files. (Cont_seg was designed to process audio from soundcard with single speaker.) This work has been replaced/updated with decode_audiofile.

* Src/live:

This is where the live code is for running SPHINX III on audio captured from a sound card. It works, however it has barely  been tested.

* Src/win32:

Contains the Microsoft Visual Studio project for building SPHINX 3.3.

 

 

 

Seg_ad:

This directory contains my working files for a segmentation algorithm based solely on adc or raw data. This is mostly a working/testing directory; the files developed here were moved directly into s3.3/src/audiofile_decode. And more progress was made.

 

ExtractAudio:

Contains the code for using the filter that Darmesh wrote. The code is designed for extracting and converting the audio from a media file. The audio is extracted and downsampled to 16Khz, 16bit, mono. (I don’t know which channel is extracted.)

 

Perl:

This directory contains misc. Perl scripts for processing different things. Most of them are hacking scripts of little importance. If it seems useful, I’ll list it here.

*    CreateTRS.pl:

This script takes the output of Decode_audiofile and creates a .trs file that can be read by Transcriber. This is very useful for checking alignment or quickly comparing the output of the decoder with the audio. (It is also possible to edit/manipulate word boundaries.)

 

HUB97_models:

This directory contains the HUB97 continuous acoustic models (as well as a large BN language model). The documentation explaining the models and how they were trained is here: BN_AM_HUB97.doc or BN_AM_HUB97.htm. (The .doc may be more update to day purely due to laziness.)

 

Fastdecoder:

This directory contains the s3.2 source code with MS Visual Studio projects for building. The project files make sure of the Intel Compiler for optimal performance.

Fastdecoder_timing:

This directory contains the output of many different testing runs to compare speech/accuracy of different system tweaks. Also compared different hardware. (e.g PC133 CAS2 memory vs. RDRAM PC800 memory.)

 

 

 

Some sources of SPHINX Documentation

 

Ravi’s talk on the Sphinx 3 transition to Sphinx 3.2 is here: HTML or PPT.

Ravi’s slides for his exit and codewalk talks are here: HTML

Ravi’s Sphinx 3 decoder talk is here: local HTML

 

Rita’s SPHINX training FAQ is here: local HTML or remote HTML (Remote promises to be more up to date.)

 

CMU Speech web page: www.speech.cs.cmu.edu