I have this video with a self-created SRT caption file. I want to be able to use the translated captions with the original SRT file timing, send it through the Text-to-Speech feature to create voiceovers in various languages, that approximate the timing in the SRT file. That way, I can completely replace my original voice audio track with the voiceover created by Text-to-Speech using the translated captions.
I'm already able to copy paragraphs of text from my translated captions, and send them through the Text-to-Speech tool, to have them be spoken aloud in the translated language. What it lacks however is the timing that matches the audio narrative to what is shown on the screen during my video. Reviewing the caption transcript, I have 1194 blocks of text in my 65 minute file. It is a daunting task to translate all of those snippets of text, and then add them to my project as audio files.