this post was submitted on 10 Oct 2023
2 points (100.0% liked)

Hacker News

3943 readers
3 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules0. Keep it legal

  1. Keep it civil and SFW
  2. Keep it safe for members of marginalised groups

founded 1 year ago
MODERATORS
 

There is a discussion on Hacker News, but feel free to comment here as well.

top 2 comments
sorted by: hot top controversial new old

Great initiative and good idea, but just tested it with Youtube video and it was really horrible ito success rate. This was for a native language in my country.

[โ€“] Seigest@lemmy.ca 1 points 1 year ago

Just tried it. It's got some benefits but overall it has a huge flaw.

Seems like it's just detecting the audio then forming a voice profile as well as interpreting the text. It then translates the text and has the voice profile read it as English.

Here was my test. I had an English audio segment built using speech synthesis. I also had that voice read out the same text but translated to French. I had to manually modify some acronyms in both cases in order to have them read properly. but this mostly works and has for quite some time. The only issue I get is in the French audio clip I get more degradation and the voice sounds like it's being absorbed into the void. Also the word choices are "too formal amd akward" according to my French speaker.

So with this new thing i took the English voice clip and put it into the new tool with instructions to translate it it into French. In theroy this new ai should produce French translated audio that is simular to what I can get just using speech synthesis. But it failed.

For one thing it seems the text interpretations understand acronyms, but the synthesizer doesn't so those will come out wrong and can't be corrected. This includes words like "eLearning" and website addresses. Given my organization has a acronym as it's name, this is a huge issue.

That being said test 2 was a short video of a narrator reading out some speech with music in the background. It handled this pretty well and actually kept the music in. However, the voice profile was a bit more robotic as the video is only about 1min.

Suggestion. We need to be able to edit the script. This way we can alter mispronounced words. This also needs to be done prior to part where our character count gets reduced.