|
176 | 176 | "source": [
|
177 | 177 | "### Modules We’ll Need for Audio Recording and Playback (2 of 2)\n",
|
178 | 178 | "```\n",
|
179 |
| - "pip install pyaudio # Windows Users conda install pyaudio\n", |
180 |
| - "pip install pydub \n", |
181 |
| - "```\n", |
182 |
| - "\n", |
183 |
| - "These are also installable now with conda, which will auto install `portaudio` if necessary:\n", |
184 |
| - "```\n", |
185 |
| - "conda install pyaudio \n", |
186 |
| - "conda install pydub \n", |
187 |
| - "```\n", |
188 |
| - "\n", |
189 |
| - "**Mac users** might first need to execute\n", |
190 |
| - ">`conda install -c conda-forge portaudio`\n" |
| 179 | + "pip install sounddevice \n", |
| 180 | + "pip install simpleaudio \n", |
| 181 | + "```" |
191 | 182 | ]
|
192 | 183 | },
|
193 | 184 | {
|
|
322 | 313 | "### Other Imported Modules\n",
|
323 | 314 | "```python\n",
|
324 | 315 | "import keys # contains your API keys for accessing Watson services\n",
|
325 |
| - "import pyaudio # used to record from mic\n", |
326 |
| - "import pydub # used to load a WAV file\n", |
327 |
| - "import pydub.playback # used to play a WAV file\n", |
328 |
| - "import wave # used to save a WAV file\n", |
| 316 | + "import wave \n", |
| 317 | + "import simpleaudio as sa\n", |
| 318 | + "import sounddevice as sd\n", |
| 319 | + "from scipy.io.wavfile import write\n", |
329 | 320 | "```\n",
|
330 | 321 | "\n",
|
331 |
| - "* **`pyaudio`** for **recording audio** \n", |
332 |
| - "* **`pydub`** and **`pydub.playback`** to **load and play audio files**\n", |
| 322 | + "* **`sounddevice`** for **recording audio** \n", |
| 323 | + "* **`simpleaudio`** to **load and play audio files**\n", |
333 | 324 | "* **`wave`** to save **WAV (Waveform Audio File Format) files**"
|
334 | 325 | ]
|
335 | 326 | },
|
|
370 | 361 | "### Main Program: Function `run_translator` (2 of 6)\n",
|
371 | 362 | "* **Step 2**: Call **`speech_to_text`**\n",
|
372 | 363 | " * **Speech to Text service** transcribes text using **predefined models**\n",
|
373 |
| - " * Most languages have **broadband** (**>=16kHZ**) and **narrowband** (**<16kHZ**) models (based on **audio quality**)\n", |
374 |
| - " * App **captures** audio at **44.1 kHZ**, so we use **`'en-US_BroadbandModel'`**\n", |
| 364 | + " * They now have general multimedia models and models optimized for telephone audio \n", |
375 | 365 | "\n",
|
376 | 366 | "```python\n",
|
377 | 367 | " # Step 2: Transcribe the English speech to English text\n",
|
378 | 368 | " english = speech_to_text(\n",
|
379 |
| - " file_name='english.wav', model_id='en-US_BroadbandModel')\n", |
| 369 | + " file_name='english.wav', model_id='en-US_Multimedia')\n", |
380 | 370 | " print('English:', english) # display transcription\n",
|
381 | 371 | "```"
|
382 | 372 | ]
|
|
415 | 405 | "metadata": {},
|
416 | 406 | "source": [
|
417 | 407 | "### Main Program: Function `run_translator` (4 of 6)\n",
|
418 |
| - "* **Voice `'es-US_SofiaVoice'`** is for Spanish as spoken in the U.S.\n", |
| 408 | + "* **Voice `'es-US_SofiaV3Voice'`** is for Spanish as spoken in the U.S.\n", |
419 | 409 | "\n",
|
420 | 410 | "```python \n",
|
421 | 411 | " # Step 4: Synthesize the Spanish text into Spanish speech \n",
|
422 |
| - " text_to_speech(text_to_speak=spanish, voice_to_use='es-US_SofiaVoice',\n", |
| 412 | + " text_to_speech(text_to_speak=spanish, \n", |
| 413 | + " voice_to_use='es-US_SofiaV3Voice',\n", |
423 | 414 | " file_name='spanish.wav')\n",
|
424 | 415 | "```"
|
425 | 416 | ]
|
|
458 | 449 | "### Main Program: Function `run_translator` (6 of 6)\n",
|
459 | 450 | "* **Steps 6–10** repeat previous steps for **Spanish speech to English speech**: \n",
|
460 | 451 | " * **Step 6** **records** the Spanish audio\n",
|
461 |
| - " * **Step 7** **transcribes** the **Spanish audio** to Spanish text using predefined model **`'es-ES_BroadbandModel'`**\n", |
| 452 | + " * **Step 7** **transcribes** the **Spanish audio** to Spanish text using predefined model **`'es-ES_Multimedia'`**\n", |
462 | 453 | " * **Step 8** **translates** the **Spanish text** to English text using predefined model **`'es-en'`** (Spanish-to-English)\n",
|
463 |
| - " * **Step 9** **creates** the **English audio** using **`'en-US_AllisonVoice'`**\n", |
| 454 | + " * **Step 9** **creates** the **English audio** using **`'en-US_AllisonV3Voice'`**\n", |
464 | 455 | " * **Step 10** **plays** the English **audio**"
|
465 | 456 | ]
|
466 | 457 | },
|
|
475 | 466 | "\n",
|
476 | 467 | " # Step 7: Transcribe the Spanish speech to Spanish text\n",
|
477 | 468 | " spanish = speech_to_text(\n",
|
478 |
| - " file_name='spanishresponse.wav', model_id='es-ES_BroadbandModel')\n", |
| 469 | + " file_name='spanishresponse.wav', \n", |
| 470 | + " model_id='es-ES_Multimedia')\n", |
479 | 471 | " print('Spanish response:', spanish)\n",
|
480 | 472 | "\n",
|
481 | 473 | " # Step 8: Translate the Spanish text to English text\n",
|
|
484 | 476 | "\n",
|
485 | 477 | " # Step 9: Synthesize the English text to English speech\n",
|
486 | 478 | " text_to_speech(text_to_speak=english,\n",
|
487 |
| - " voice_to_use='en-US_AllisonVoice',\n", |
| 479 | + " voice_to_use='en-US_AllisonV3Voice',\n", |
488 | 480 | " file_name='englishresponse.wav')\n",
|
489 | 481 | "\n",
|
490 | 482 | " # Step 10: Play the English audio\n",
|
|
507 | 499 | "```python\n",
|
508 | 500 | "def speech_to_text(file_name, model_id):\n",
|
509 | 501 | " \"\"\"Use Watson Speech to Text to convert audio file to text.\"\"\"\n",
|
510 |
| - " # create Watson Speech to Text client \n", |
511 |
| - " # OLD: stt = SpeechToTextV1(iam_apikey=keys.speech_to_text_key)\n", |
512 |
| - " authenticator = IAMAuthenticator(keys.speech_to_text_key) # *** NEW\n", |
513 |
| - " stt = SpeechToTextV1(authenticator=authenticator) # *** NEW\n", |
| 502 | + " authenticator = IAMAuthenticator(keys.speech_to_text_key) \n", |
| 503 | + " stt = SpeechToTextV1(authenticator=authenticator)\n", |
514 | 504 | "```"
|
515 | 505 | ]
|
516 | 506 | },
|
|
552 | 542 | " * Useful when transcribing **live audio**, such as a newscast\n",
|
553 | 543 | " * [Method `recognize`’s arguments and JSON response details](https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/python.html?python#recognize-sessionless).\n",
|
554 | 544 | "* **`getResult` method** returns **JSON** containing **`transcript`**:\n",
|
555 |
| - " \n" |
| 545 | + "```json\n", |
| 546 | + "{\n", |
| 547 | + " \"result_index\": 0,\n", |
| 548 | + " \"results\": [\n", |
| 549 | + " {\n", |
| 550 | + " \"final\": true,\n", |
| 551 | + " \"alternatives\": [\n", |
| 552 | + " {\n", |
| 553 | + " \"transcript\": \"where is the nearest bathroom \",\n", |
| 554 | + " \"confidence\": 0.96\n", |
| 555 | + " }\n", |
| 556 | + " ]\n", |
| 557 | + " }\n", |
| 558 | + " ]\n", |
| 559 | + "}\n", |
| 560 | + "```" |
556 | 561 | ]
|
557 | 562 | },
|
558 | 563 | {
|
|
627 | 632 | " \"\"\"Use Watson Language Translator to translate English to Spanish \n",
|
628 | 633 | " (en-es) or Spanish to English (es-en) as specified by model.\"\"\"\n",
|
629 | 634 | " # create Watson Translator client\n",
|
630 |
| - " # OLD: language_translator = LanguageTranslatorV3(version='2018-05-01', iam_apikey=keys.translate_key)\n", |
631 |
| - " authenticator = IAMAuthenticator(keys.translate_key) # *** NEW\n", |
| 635 | + " authenticator = IAMAuthenticator(keys.translate_key) \n", |
632 | 636 | " language_translator = LanguageTranslatorV3(version='2018-05-31',\n",
|
633 |
| - " authenticator=authenticator) # *** NEW\n", |
| 637 | + " authenticator=authenticator)\n", |
634 | 638 | "\n",
|
635 | 639 | " # perform the translation\n",
|
636 | 640 | " translated_text = language_translator.translate(\n",
|
|
650 | 654 | "metadata": {},
|
651 | 655 | "source": [
|
652 | 656 | "### Function `translate` Returns a **`DetailedResponse`** (4 of 4)\n",
|
653 |
| - "* **`getResult` method** returns **JSON** containing **translation**: \n", |
654 |
| - " \n" |
| 657 | + "* **`getResult` method** returns **JSON** containing **translation** \"donde es el baño más cercano\": \n", |
| 658 | + "```json\n", |
| 659 | + "{\n", |
| 660 | + " \"translations\": [\n", |
| 661 | + " {\n", |
| 662 | + " \"translation\": \"donde es el ba\\u00f1o m\\u00e1s cercano \"\n", |
| 663 | + " }\n", |
| 664 | + " ],\n", |
| 665 | + " \"word_count\": 5,\n", |
| 666 | + " \"character_count\": 30\n", |
| 667 | + "}\n", |
| 668 | + "```" |
655 | 669 | ]
|
656 | 670 | },
|
657 | 671 | {
|
|
702 | 716 | " \"\"\"Use Watson Text to Speech to convert text to specified voice\n",
|
703 | 717 | " and save to a WAV file.\"\"\"\n",
|
704 | 718 | " # create Text to Speech client\n",
|
705 |
| - " # OLD: tts = TextToSpeechV1(iam_apikey=keys.text_to_speech_key)\n", |
706 |
| - " authenticator = IAMAuthenticator(keys.text_to_speech_key) # *** NEW\n", |
| 719 | + " authenticator = IAMAuthenticator(keys.text_to_speech_key)\n", |
707 | 720 | " tts = TextToSpeechV1(authenticator=authenticator)\n",
|
708 | 721 | "\n",
|
709 | 722 | " # open file and write the synthesized audio content into the file\n",
|
|
1251 | 1264 | "name": "python",
|
1252 | 1265 | "nbconvert_exporter": "python",
|
1253 | 1266 | "pygments_lexer": "ipython3",
|
1254 |
| - "version": "3.10.11" |
| 1267 | + "version": "3.11.5" |
1255 | 1268 | }
|
1256 | 1269 | },
|
1257 | 1270 | "nbformat": 4,
|
|
0 commit comments