Archive for February, 2011

27
Feb
11

Id machine: The Voice

…well, I’ve gone and opened the proverbial can of worms that is the human voice. This is the case because I’m trying to define a series of Archetypes for sound (particularly the voice) in order to give my project parameters to work from.

Essentially what I’m trying to do is to create a dysfunctional form of conversation between human and machine, in order to create this sense of dysfunction I actually need to define what the chaotic function will be. This is where my project can not (as yet) live up to my desire to create a machine…a system that decides for itself to be dysfunctional.Until I can begin serious work into Artificial Intelligence I will have to swallow my dose of irony.The fact of the matter is that the dysfunction present in this project is just that, presentational, it will be an illusion…a programmed function.t. In other words it’s function IS to be dysfunctional.

Anyway back to the topic at hand. The voice, this is going to be the primary input in this system, so I best understand it as well as I can.

I started with a few experiments of my own, at first I opened the Analyser patch in MAX MSP, I found it hard however to get a coherent or clear graphic from simply recording myself talking, I knew that background noise (e.g. from my laptop) played some part in the sporadic nature of the image made, but also that this was not enough to create such a sporadic example of data. So, I decided to try and hold a single note, then at different pitches. I used Garageband to make these simple recordings

I found the analyzer patch fascinating, but a little complex, I wasn’t sure what was being processed and how to harness the numbers being crunched out into some simple equations. Once I could do this the core of the functionality of my system would be in place. I then understood that a previous patch had been made to purely analyze pitch, this was my next step to understanding.

As I continued my experiments with analysing the voice in terms of pitch, I noticed that the pitch of my voice was often too low to register properly, so I took samples of female voice input and displayed them through the max patch

Through a combination of research and experimentation I discovered that there is an astonishing complexity to the human voice, a wide array of factors go towards the make up of a voice and each is individual to the user, it’s not just the physicality of the vocal chords, chest, throat e.t.c but cultural impetus to. Watch the video bellow to see a realtime physical impression of pitch and amplitude in the human voice.

The deeper I researched into the phenomena of the voice, the more bewilderingly complex my understanding became. In the process of picking archetypes (for my system to work with) I found that the vocal range of a human was 200-7000 Hz, so I thought dividing these into 4 ranges would surface….that’s until I discovered another source that suggested the TRUE range of the voice was 300-3000Hz that this was the correct and ‘Fundamental’ voice frequency (VF) limits, so I supposed that this small range of data would give me more precise results, yet still capture all the pitch variations of the voice. That was until I read further into the source as it continued to explain that actually the average male voice (85-180Hz) and female voice (165-255Hz) fall under the lowest mark of this VF.

I have to admit this immediately confused me but with a little patience and more reading I began to get the picture. Harmonics, it’s all relative to these, harmonics are essentially multiples of a tone. There is a direct mathematical relationship between a note, or frequency and it’s harmonic.

Infact there are always harmonic frequencies to sound, there are so many factors that effect the voice that its important to think of any vocal sound as a rich layered set of frequencies rather than a single tone

Take a look at these spectrograms of ‘overtone’ singing, it’s fascinating to see the relationship between all the frequencies taking place

The fact is, in terms of human hearing there is alot of complex compensation and interpretation taking place in the mind

The mind associates a harmonic with the fundamental tone, in effect the other harmonics give an impression of the fundamental tone. What does this mean in terms of a machine? will I need to calibrate the system differently? will it not perform this compensation….does that matter?….this was going to complicate things some what  and I can see that in the future of for this project I’m going to need an acoustic expert to truly tame the voice.

But for now the real question is: how do I pick my Archetypes?…we’ll I’m going to start with Frequency, I’m going to make an automated patch that records blind as it were, then analyze the extremities of the frequency input and use them as working parameters to create a mean, an average number for the pitch input.

Advertisements
26
Feb
11

Point of view Id machine demo

 

22
Feb
11

Id machine: Max patch development

Working with Kavi I’m able to build upon the original patch. Here the function of saving information into segregated folders has been designed, now I need to process the information from the Analysis patch and link them together. Next I will be creating an Archetype of a measurable function of the human voice. There is so much potential for work on this project and I’m excited about the future possibilities of this work.

22
Feb
11

Id machine P.O.V animate sketch

Continuing my first endeavors in using flash to illustrate how elements of my project will work, This time from a first person point of view, an infography of how the viewer/user will experience Id machine.

Perspective seems to be something of an issue in this animation, from a point of view shot it’s difficult to distinguish between the actions of the Gramophone and that of the user (via microphone input). I will explore ways to make this clearer. Use of text could be a simple but efficient way to solve this problem.

16
Feb
11

Id machine animation sketching

 

Sketching thoughts about how best to demonstrate both the participant identification process and the kinetic social cues for Id Machine

 

16
Feb
11

Id machine Audio analysis

It’s hard to conceive sometimes that sound is just another form of information, vibrating waves of energy..physical stuff.

Searching through MAX/MSP forums for ways to obtain data and to visualize sound, I came across the Sonograph patch. Technically it seems that what they produce is a Spectrogram, a Spectrogram depicts the attack and decay of energy (amplitude/intensity) as well as frequency of sound that is generated…a sort of audio footprint.

an image of a low pitched (left) and a high pitched voice (right), translated into a sonograph in MAX/MSP

 

 

Analyser MAX external patch

I finally came across this complex patch for analysing live audio thanks to Tristan Jehans work at MIT there’s a patch that combines pitch tracking, loudness brightness and noise estimator as well as spectral decomposition. Now I need to learn how to convert the data output into something useful for my system

 

15
Feb
11

Robots are about people: Cynthia Breazeal

I’m finding Cynthia Breazeal’s work really fascinating and what’s more….it’s useful research into Robotic systems that use social cues to produce an emotive response from humans. Her attitude toward the potential relationship between Artificial Intelligence and people is really engaging, machines that ‘talk our language’ in terms of proto-conversation, presenting us with psychical cues that work subconsciously informing our impression of their intent and in doing so allowing us to relate to them. Still there’s still something fundamentally missing in these interactions, the ‘drives’ of the robots are manufactured, the robot does not make decisions without being informed to do so, the intent behind their actions…are not initiated by the robot, rather the drives are due to the programmers whim. This is an area of A.I. I want to explore further as Id machine grows and develops, I’m hoping to find a way in which I can design a machine that thinks for itself, that decides it’s own actions. Can I encourage or train a machine to ‘want’ to be dysfunctional, instead of forcing it.?




%d bloggers like this: