There has been significant progress with the talkTable. The code looks in good shape. I have integrated speech recognition into the program. I’m using Luke Dubois’ speech.js library to do this. The library basically acts as an interface between p5 and Google’s speech recognition kit. There were certain issues that I faced while doing this :
- speech.js library doesn’t hold the capability to end the speech detection instruction. I made changes to the library in order to get around this. The changes were very basic. I simply added a method to stop the speech detection process.
- The Google API automatically stops running after a certain amount of time (roughly 1.5 – 2 minutes). I figured this might be happening because the way Google’s API has been designed is to trigger speech detection on a keypress, and it’s purpose revolves around detecting a single search query. I added a running interval in the program to re-initiate the API every 30 seconds. If it’s already running, the program does nothing.
- The API calls couldn’t be made using the server created by p5 IDE. Instead to diving into the server code for the p5 editor, I resorted to using a different server. With Sharif’s help, I created a python server that runs on my local host, and can easily talk to the API.
Now that I have worked around these challenges, I have a continuous workflow that uses speech recognition to interpret what the users are talking about. The speech detection event is triggered when the users select a topic discussion. The only thing that I wish to add to the code is to broaden the data set of words that are being recognized by the program. I was considering using an AI extension but I figured that the AI chat bots serve a different objective than what I’m trying to achieve. The bots are meant to chat with the user, whereas my program aims at making the users talk amongst themselves, while trying to maintain the conversation intermittently. Therefore, I will creating my own data set for the program. Advantages of doing this are:
- The program can be personalized to have certain opinions and characteristics. I much prefer this over a generic response machine.
- The program can be curated to listen to the users and instigate them to keep talking.
I ran into many issues with the physical computing bit. The issues are as follows:
- FSRs have minds of their own. First of all, there is a lot of noise in the analog results that they produce. However, since the results are within a certain range I can work around this problem. The bigger problem is that if their position is disturbed, they start producing different ranges. They are highly sensitive to placement, and are also very fragile. I constructed my own FSRs and they lack the durability of a manufactured one.
- Finishing and Fabrication. I’m very new to fabrication and it intimidates me. This is my first proper attempt at fabricating and consolidating circuits. Right now, the circuit is all over the place. It doesn’t fit into the designated box, and is highly susceptible to losing connections on physical disturbance.
To solve these issues, I’m considering the solutions stated below:
- I might be using a switch instead of FSRs (as suggested by Benedetta). The only advantage with an FSR is that it gives me an approximate idea of the weight of the object, which can help me in confirming that it is a phone that’s been inserted into the socket and not some other object. But at this point, using a switch seems like a more favorable approach.
- I have fabricated the boxes, the switch panels, sockets and socket covers. The overall structure looks swell. However, I’m facing issues with optimizing my circuit and trying to make it highly durable. Mithru is helping me out with solutions on better circuit design techniques. I will be updating this post with pictures on how it progresses.
User testing has been extremely helpful to give me implementation ideas. Following is some of the feedback that I’m planning to incorporate into the project:
- Projecting the visualization on the table instead of using a computer screen. This was suggested by Dominic, and sounded like a great idea. It makes the project actually look like a talkTable, first of all. Also, it takes away the awkward positioning of the laptop in the middle of the table. I think it just makes the whole conversation setting more natural.
- Using synthesis with/instead of the text. This was suggested to me by Benedetta. It’s a good idea to use spoken word instead of text. Text can be distracting in the middle of conversations. I will be implementing depending on the time that I can find.
- I also tested the project in the ITP lobby to make sure that the audio inputs work fine in the presence of noise. This test was performed to make sure that the device doesn’t malfunction during the show. The words were comprehensible if the user spoke into the mic, which was great.
To send across an idea, the structure of the device will look something like this. Right now, the project looks in pretty dismal state since I’m still working on the fabrication and rewiring the circuit. I will soon be posting pictures of the progress that I make.