Ratha (papertygre) wrote,

Speech-to-text utility idea

Something that I think would be useful: A device or software that does the following: Lets you record speech -- lectures, notes while driving, interviews, diary entries -- and auto-transcribes the speech into text. Then presents the text in an interface that allows you to click on any word and hear the audio that was the source of that word. So if the translator couldn't match a word, it will show in the text version as question marks or something, and you can click on the question marks to hear the source audio and then you can fill in the missing word. Or if a word looks funny and you think the translator got it wrong, you can click on the word and hear it, and then make a correction. Bonus points if the translator can be trained from the corrections.

This would be an improvement over other transcription interfaces I've seen because you normally seem to have to listen through from the beginning to get to the part you want to hear, which is time consuming, since you can't skim audio the way you can skim text. (Jott is good, but designed for short messages, and I am imagining something designed for long-form content.) The ready availability of the transcription process would make it actually useful to record things like class lectures, or to keep an audio diary, because you can search and skim through the material; or if you want to hear a particular passage, you can select as much text as you want to hear and issue a Play command.

Maybe the interface could be like OneNote and let you build up a reference library, by maintaining time/date metadata for each item and letting you add tags (aside -- am I weird to want hierarchical tags? or put another way, multi-folder categorization.)

