Sort:  

That's a VERY complex problem, no easy answer - you could possibly pull it off by creating a ruleset yourself though.

Basically the rules for when to start a new sentence can be defined in terms of what comes before and after the full stop - so write that ruleset and iterate through the words.

Yeah I basically took the auto-captions of YouTube I had already cleaned up for difficult words like "grid coin" and BOINC, as a vtt subtitle file.

I got rid of the vtt timecodes by GNU tools like sed.

I then loaded up the vtt in TextEdit and cmd-F to highlight words. I noticed that @CM-Steem aka customminer uses stop words like "So" a lot so I put periods before those.

https://steemit.com/gridcoin/@nutela/gridcoin-whaletank-rough-transcript-friday-8th-aug-2017

Here's the video:

I edited upto 15 mins or so.

You wouldn't believe how much text one can fill be simply talking for 15 minutes. Way too much work to do by hand.

You could try to make use of the natural pauses in speech to add the full stops as well.

Hey that's a great idea! I wonder though how to get that, I was wondering if YouTube would offer any insight but their tool is closed off. IBM Whatson looks much cooler and even has a github link but I'm not so sure of the quality. It couldn't keep up when testing real time (with Loopback) but then again real time is maybe too much to ask.

Full post with plenty of images

Coin Marketplace

STEEM 0.18
TRX 0.14
JST 0.029
BTC 57020.43
ETH 3081.72
USDT 1.00
SBD 2.41