Nature of Code / Week 4-7 / Final Project
Somnolent Listener – The hearing impaired note taker and visualizer
Somnolent Listener is a speech recognition system that listens to speech, interprets it and produces results in various forms. It is prone to making some mistakes while listening, and hence has been titled using the adjective ‘somnolent’.
The entire idea is centered around making speeches and lectures more interactive. The listener records the data producing real time subtitles, maintains a transcript of the entire recording and also takes notes. There are three associated visualizations that can be switched using the arrow keys or the number keys. Visualizations one and two are particle systems that change based on the volume level of the input. Visualization three uses the recorded transcript to generate keywords. Each keyword has a weight based on the number of times it is repeated. And each keyword also has associations with other keywords based on how close together they were spoken in time. It basically acts as an automatic note-taking system for the student/listener.
1. The system acts as an automatic note making device for a student or listener. This would ensure that a person can focus undivided attention to the speaker or lecturer and not worry about capturing everything that’s spoken.
2. For a person whose first language isn’t English, it can often become difficult to follow what is being spoken. The subtitles assist such an audience member to understand the words.
3. The transcript can be used to look back at the lecture and use references. Or perhaps, if one is sleepy or inattentive in class, they could use it to know what had been spoken.
4. The visualizations provide information around the amplitude levels. They can also be changed to represent frequencies, wavelength et cetera. This is not really something new, but it looks cool haha. And perhaps, can make a speech more interesting.
I’d like to add more visualizations and more capabilities to this system. I also intend on making the listener more sophisticated. Plus, the keyword selection and matching algorithm requires further work. I’m interested in classifying keywords and generating different methods to group words together.
I’m also interested in using this project in an installation context wherein, the listener listens to an input, records it and then reiterates it with the mistakes. In different ways. Perhaps representing what a computer listens and what it understands.
The entire code directory can be found here : link
Daniel Shiffman, instructor and mentor for Nature of Code
Luke DuBois, for p5.speech library
Jason Sigal, for p5.sound library