DJ Phonetic

Beatbox with historical speech

What is DJ Phonetic?

DJ Phonetic is a free online tool for making beats with the kicks, snares, and hi-hats that are hidden within human speech.

Who made DJ Phonetic?

This project was created by Brian Foo, an artist who likes to make creative tools powered by open access cultural heritage material from libraries, archives, and museums. He was the 2020 Innovator in Residence at the Library of Congress where he created Citizen DJ, a tool for creating hip hop music using the Library's free-to-use audio and visual material.

Why did you create this project?

I've always been fascinated by the inflections, intonations, tonality, accents, and cadences of human speech, qualities that are often lost through the process of transcription. This project attempts to center rather than obscure these speech qualities by turning them into musical instruments, asking the user to listen carefully for the hidden beats, rhythms, and melodies embedded in human speech.

Where do you get your source material from?

All the audio recordings used are in the public domain and were sourced from publicly accessible sources such as the Library of Congress's American English Dialect Recordings, University of Virginia's Miller Center, and Michigan State University's G. Robert Vincent Voice Library. The original recordings are linked within the app's interface.

How do you choose what parts of speech correspond to different drums?

I simply mapped specific phonemes (e.g. the /b/ in "ball", /k/ in "kite", /t/ in "tap") to specific drums (kick, snare, hihat.) This is obviously a subjective process. For reference and inspiration, I looked to beatboxing, the art mimicking drum sounds with one's one's mouth, lips, tongue, and voice. I then used basic audio analysis techniques to select the best drum candidates based on audio characteristics such as audio brightness, sharpness, tonality, and loudness. For example, a snare sound would be loud and short with high sharpness and low tonality.

How did you align the transcript to the audio so precisely?

I use a tool called the Montreal Forced Aligner to automatically align the source audio to the transcript with precision down to the phoneme. The Montreal Forced Aligner is built on top of Kaldi, an open source speech recognition toolkit.

Are the drum sounds really just from speech?

Yes, the drums you hear are taken directly from the source material. However, some basic filters (e.g. highpass, lowpass, highshelf, etc) are applied to boost or reduce the bass depending on which drum it is.

Can I download what I create?

Unfortunately, this feature is not available. This app focuses on playing with human speech to make beats rather than be a full-blown drum machine or DAW. However, you can record your screen while you use DJ Phonetic to capture your performance.

Can I download the individual drum sounds?

Yes! Simply right-click (or press-and-hold on a touch device) on text within the transcript and click on the download icon. Then you can save an audio clip as a .wav file to your computer which you can use in any music making software.

Is this project open source?

Yes, the code is open source and can be found in this repository along with documentation for using your own audio recordings.

I found an error in one of you transcripts

Thank you, please send me the error and correction at hello@brianfoo.com.