RE: Automating Multi-Lingual and Multi-Speaker Closed-Captioning and Transcripting Workflow with srt2vtt

You are viewing a single comment's thread from:

RE: Automating Multi-Lingual and Multi-Speaker Closed-Captioning and Transcripting Workflow with srt2vtt

alexpmorris (65)in #beyondbitcoin • 7 years ago (edited)

The story gets murkier by the minute. Besides of YouTube timing issue

did you watch the sample YouTube video in my post? Try changing between "English" captions and "English (auto-generated)" to see the difference in display and timings (both of which were excellent for most use cases I would envision, at least with this particular audio sample).

Also note that when speaker IDs are inserted into the text that we should not try to align those since they are actually not there.

YouTube's captioning before speaker IDs are added, Gentile simply skips over them. But of course, as I explained, using word-by-word captioning may be mostly unnecessary if all audio tracks are simultaneously "corrected" as follows:

Record multitrack audio
1a. Load all tracks into a multitrack audio editor, cut noise/"utterances" over all clips simultaneously, export individual edited tracks
get Speech-To-Text with time codes for each track separately
...

7 years ago in #beyondbitcoin by alexpmorris (65)

Sort:

nutela (61) 7 years ago

Sure I agree that multitrack cleaning is the way to go but SteemPowerPics didn't know how even when I showed it should be possible...

I'm not sure about the speaker IDs, if the speakerIDs are like words/names are in the transcript how can you skip them? With [something] like square brackets?

$0.00

STEEM 0.18

TRX 0.13

JST 0.029

BTC 59288.97

ETH 3114.64

USDT 1.00

SBD 2.39

RE: Automating Multi-Lingual and Multi-Speaker Closed-Captioning and Transcripting Workflow with srt2vtt

Coin Marketplace