You are viewing a single comment's thread from:

RE: Automating Multi-Lingual and Multi-Speaker Closed-Captioning and Transcripting Workflow with srt2vtt

in #beyondbitcoin7 years ago (edited)

The story gets murkier by the minute. Besides of YouTube timing issue

did you watch the sample YouTube video in my post? Try changing between "English" captions and "English (auto-generated)" to see the difference in display and timings (both of which were excellent for most use cases I would envision, at least with this particular audio sample).

Also note that when speaker IDs are inserted into the text that we should not try to align those since they are actually not there.

YouTube's captioning before speaker IDs are added, Gentile simply skips over them. But of course, as I explained, using word-by-word captioning may be mostly unnecessary if all audio tracks are simultaneously "corrected" as follows:

  1. Record multitrack audio
    1a. Load all tracks into a multitrack audio editor, cut noise/"utterances" over all clips simultaneously, export individual edited tracks
  2. get Speech-To-Text with time codes for each track separately
    ...
Sort:  

Sure I agree that multitrack cleaning is the way to go but SteemPowerPics didn't know how even when I showed it should be possible...

I'm not sure about the speaker IDs, if the speakerIDs are like words/names are in the transcript how can you skip them? With [something] like square brackets?

Coin Marketplace

STEEM 0.18
TRX 0.13
JST 0.029
BTC 59288.97
ETH 3114.64
USDT 1.00
SBD 2.39