You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
main
${ noResults }
91 lines
2.5 KiB
Markdown
91 lines
2.5 KiB
Markdown
# Transcription
|
|||
|
|||
Video **transcription** consists in transcribing the audio content of a video to a text.
|
|||
|
|||
> This process might be called __Automatic Speech Recognition__ or __Speech to Text__ in more general context.
|
|||
|
|||
Provide a common API to many transcription backend, currently:
|
|||
- `openai-whisper` CLI
|
|||
- `faster-whisper` (*via* `whisper-ctranslate2` CLI)
|
|||
|
|||
> Potential candidates could be: whisper-cpp, vosk, ...
|
|||
|
|||
## Requirements
|
|||
- Python 3
|
|||
- PIP
|
|||
|
|||
And at least one of the following transcription backend:
|
|||
- Python:
|
|||
- `openai-whisper`
|
|||
- `whisper-ctranslate2>=0.4.3`
|
|||
|
|||
## Usage
|
|||
|
|||
Create a transcriber manually:
|
|||
|
|||
```typescript
|
|||
import { OpenaiTranscriber } from '@peertube/peertube-transcription'
|
|||
|
|||
(async () => {
|
|||
// Optional if you want to use a local installation of transcribe engines
|
|||
const binDirectory = 'local/pip/path/bin'
|
|||
|
|||
// Create a transcriber powered by OpenAI Whisper CLI
|
|||
const transcriber = new OpenaiTranscriber({
|
|||
name: 'openai-whisper',
|
|||
command: 'whisper',
|
|||
languageDetection: true,
|
|||
binDirectory
|
|||
});
|
|||
|
|||
// If not installed globally, install the transcriber engine (use pip under the hood)
|
|||
await transcriber.install('local/pip/path')
|
|||
|
|||
// Transcribe
|
|||
const transcriptFile = await transcriber.transcribe({
|
|||
mediaFilePath: './myVideo.mp4',
|
|||
model: 'tiny',
|
|||
format: 'txt'
|
|||
});
|
|||
|
|||
console.log(transcriptFile.path);
|
|||
console.log(await transcriptFile.read());
|
|||
})();
|
|||
```
|
|||
|
|||
Using a local model file:
|
|||
|
|||
```typescript
|
|||
import { WhisperBuiltinModel } from '@peertube/peertube-transcription/dist'
|
|||
|
|||
const transcriptFile = await transcriber.transcribe({
|
|||
mediaFilePath: './myVideo.mp4',
|
|||
model: await WhisperBuiltinModel.fromPath('./models/large.pt'),
|
|||
format: 'txt'
|
|||
});
|
|||
```
|
|||
|
|||
You may use the builtin Factory if you're happy with the default configuration:
|
|||
|
|||
```Typescript
|
|||
import { transcriberFactory } from '@peertube/peertube-transcription'
|
|||
|
|||
transcriberFactory.createFromEngineName({
|
|||
engineName: transcriberName,
|
|||
logger: compatibleWinstonLogger,
|
|||
transcriptDirectory: '/tmp/transcription'
|
|||
})
|
|||
```
|
|||
> For further usage [../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts](../tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts)
|
|||
|
|||
|
|||
## Lexicon
|
|||
- ONNX: Open Neural Network eXchange. A specification, the ONNX Runtime run these models.
|
|||
- GPTs: Generative Pre-Trained Transformers
|
|||
- LLM: Large Language Models
|
|||
- NLP: Natural Language Processing
|
|||
- MLP: Multilayer Perceptron
|
|||
- ASR: Automatic Speech Recognition
|
|||
- WER: Word Error Rate
|
|||
- CER: Character Error Rate
|