Embedding google speech recognition in python desktop application without limits.

celestialme (46)in #programming • 8 years ago (edited)

google has developed google speech recognition api for desktop applications but you need key for it and free key comes with 60 min for 1 month. it's easier to use api with limited minor usage.

import speech_recognition as sr
from mtranslate import translate
r = sr.Recognizer()
def run():
 with sr.Microphone() as source:
    print('listening')
    audio = r.listen(source)
    result = r.recognize_google(audio,language='en-US')
    print(result)
    print(translate(result))
    run()
run()



just couple lines of code but as i mentioned it's limited you can process 60 min audio file to text.

here starts  main part of tutorial. since google javascript api is completely free and without limitations we are going to be using this. we will be using https://www.google.com/intl/en/chrome/demos/speech.html as server which will take our requests. also speechrecognition by google javascript api works only in google chrome so we need to stick with this. and in order to embed chrome with our desktop application we will need selenium as automation for chrome and for interacting with DOM elements.

you will need pip install selenium and also download chromedriver.exe. then place it into python>scripts folder if want to avoid path assignment for selenium.  when you open this code from interpreter black window popup  chromedriver which. then browser will start and then you will  to start dictating and after that you have 4 sec to say all you want. then it will recognize and print results. after that you can rework  this code as you wish and embed as you want. you got the general idea.
   

from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from time import sleep
print 'initializing'

option = webdriver.ChromeOptions()

option.add_argument("--incognito")

option.add_argument("--use-fake-ui-for-media-stream")

browser = webdriver.Chrome(chrome_options=option) # there you see i dont have path set. my chrome driver lays in python>scripts folder

browser.get("https://www.google.com/intl/en/chrome/demos/speech.html")

browser.execute_script('return document.getElementById("select_language").selectedIndex = 11')

browser.execute_script('return updateCountry()')

print 'listening'

browser.execute_script('return document.getElementById("start_img").click()')

sleep(4)

browser.execute_script('return document.getElementById("start_img").click()')

print browser.execute_script("return document.getElementById('final_span').innerText")


i know this tutorial was not nicely done because i had not enough time and was rushing, but i wanted to share.