[Yet another paper I wrote for school. I don't think I wrote it as well as I could have, but I'd still rather share it than not]
In 1965, the futurist and inventor Ray Kurzweil performed a very strange piano piece on the TV show, *I’ve Got a Secret*. His secret was that the song was composed by a special computer he built himself. He wasn’t even the first person to come up with the idea of computer-generated music, just one of the earliest people to demonstrate it. There is a long history of music AIs; they have been around for more than half a century. The judges on that old TV show guessed Kurzweil’s secret almost immediately, but after decades of technological progress, it is no longer as easy to discern whether a song is written by a human or by a machine. The field of computer-generated music has grown not only in the technical complexity of its algorithms, but also in the deepness and melody of its songs. In this essay, I shall go over the details of my research soon, to show how algorithms are now able to do most of the work of songwriting, and to speculate about what that means for the future. By the way, I am not saying that computers can replace human songwriters today. They might in the far future, but the current technology isn’t quite there yet. Rather, I am claiming that the best songwriting programs can currently take away more than half of the work that it takes to write a song, maybe even more than 90% of the work.
But first, let us look at the history of the field, back when computers did much less of the work. In order to fully appreciate how general and automatic music-making tools have become, it is useful to look at how they used to be in the past. Back in 1950, the production of the computer generated string quartet *Illiac* required the programmers to code explicitly the rules of traditional string quartets. The programmers had four methods “to assume rules by: a) traditional theory (eg prohibition of fifths and octaves parallel), b) dictated by the imagination of the programmer, even without relationship with music, c) derived from statistical analysis of other compositions and, finally, d) rules auto-generated by computer.” (Nunzio) All automatic music generators need to encode musical rules either explicitly or implicitly, and in the Illiac it was mostly explicit. However, because of the hard-coded rules regarding the specifics of string quartets, the algorithms could not be used as is to make a piano piece, let alone jazz or heavy metal. Even more general tools like Flow Composer explicitly contains rules about harmonics and timing explicitly programmed in. This is probably because programs based only on machine learning and training data have a tendency to sound very discordant and atonal unless it is trained with lots of data. This is perhaps good if you want to make contemporary music in the style of John Cage but not very useful for most genres. So regardless, it was difficult at the time to make an algorithm compose music of multiple genres.
Much more recently, IBM claims that its Watson Beat can create an entire track of beats, melodies, and ambiance from just a 20 second sample, regardless of what style those 20 seconds are in. (Watch) How might this be useful? To test it out, IBM showed the program to the R&B band Phony PPL. In an interview, lead guitarist Elijah Rawk said, “Every artist, musician ever will tell you that writer’s block is real, and there are days which you feel like, all right, today I’m just going to knock out as many songs I can, and you get stuck on the first idea. So I think Watson is a good kind of— it’s a good platform for when you’re in that rut. … Just play a couple notes and it’ll give you some kind of an inspiration. Go a whole other way than you might’ve thought originally.” Lead singer Elbee Thrie followed up by saying that Watson doesn’t make the song for you. It can’t output anything good if you don’t input anything good, but Watson can definitely spice up whatever you give it or give you new ideas. (Watch) Neither of them feel envious of Watson; they see him more as a companion than as a competitor. Phony PPL still make the final say in what the song sounds like, but Watson helps them skip over most of the hard work required to generate the first draft. Even though the songs still require human input, the program saves a significant amount of labor, allowing for Phony PPL to produce more songs with the same amount of work.
As impressive as Watson Beat is, it still requires 20 seconds of human inspiration to start with. Flow Composer requires even less input: you only have to select a genre or a style. The Sony Computer Science Lab in Paris demonstrated Flow Composer by using it to write “Daddy’s Car,” a computer-generated pop song which has received more than one million views on YouTube. To write a song with Flow Composer, the user selects a genre or style, note length limits, and optionally, a beat in the background, and it generates a song in the form of a simplified music sheet, or a “lead sheet.” Then, the user can just keep the measures which sound good, edit them, and re-roll the measures which don’t sound good. Behind the scenes, Flow Composer uses something called “Markov constraints” to create music of a specific style. Markov chains are fast algorithms for generating information because they only go left to right, generating the next note based only on the previous few notes, but it’s difficult for the process to create music with long, overarching patterns, since it can’t look back very far. Sony’s breakthrough was in thinking about ways to bend the markov chain to fit into certain constraints, without making it look back far. (Pachet 1077) This means that it can quickly spit out words that fit into a sonnet or notes that fit into a jazz tune. Like Watson, Flow Machine requires you to do the remaining editing by yourself but it is very helpful in generating the first version. You might want to move certain notes up or down or delete a measure to make it sound just right, but Flow Composer still made most of the decisions about which notes to put where.
Despite the pitfalls, the Alphabet’s child company Deepmind isn’t afraid to make music generators with low amounts of explicit programming but high data consumption. Deepmind is playing around with making music using WaveNet, a neural net originally designed to generate speech. Neural nets require a lot more data than Markov chains, but are potentially more accurate when well trained. WaveNet has no explicit music theory — it does not even think in terms of individual notes — rather, it analyzes individual waveforms and generates the next the sound wave bit by bit. This precise fidelity lets the system capture the precise and subtle details of musical notes. Instead of using a traditional neural net, it uses multiple layers of neural nets with each layer passing data to the lower one, filtering the input and checking for errors at each level. (Oord 3) The multiple layers allow the system to learn very abstract concepts that involve a lot of delay. Trained on recordings of humans talking, it learns to mimic breath noises and subtle lip smatterings, and trained on classical music it when to speed up and when to crescendo without being explicitly programmed to. The WaveNet team says that even though it was built for the human voice, the results for music generation are very promising. (Oord 1) More speculatively by combining the musical and speech aspects, I believe WaveNet can learn to sing beautifully.
But will it be able to sing well enough to replace human singers? One thing is for sure: lots of people would lose their jobs in the short term if labor-saving music programs develop too quickly. However, consumers will benefit from increased production of songs regardless. It’s similar to how automation of farming has significantly decreased the number farming jobs available, but losing those jobs was a small price to pay for the vastly increased food production. And in the long run, most former farmers were able to find jobs elsewhere, allowing us to specialize and create whole new ways to work. Automating an industry traditionally does not cause long term unemployment. It might decrease the number of people making money from music, but those people hopefully will find jobs elsewhere. I know that might not be reassuring, but that’s how it worked in the past.As technology progresses, no doubt the industry will be disrupted in ways that lose people jobs in the short term but this makes way for the innovations to come. Newton-Rex, the co-founder of the AI music company Jukedeck, points out that “recorded music’s brilliant, but it’s static. If you’re playing a game, Hans Zimmer isn’t sitting with you composing. Responsive systems like that will be a big part of the music of the future.” (Marshall) This sort of technology could revolutionize not just videogames but DJing as well. Even if we have to destroy the music industry as we know it, it would be a small price to pay for what comes next, I hope.
But regardless of technical advances, can a song really be artistic without an artist? A large part of music is the characters, the celebrities, the personalities which write our songs. Even though some computer programs are advanced enough to make songs that convey genuine emotion, that emotion is reflected from the thousands of artists in the training data, or from the programmer, or from the artist running the program. Computers themselves feel nothing, but they don’t have to if we personify. I think computer generated songs can develop their own voice, their own “intent” which is convincing enough to fool our hearts even if it doesn’t fool our minds, and that’s good enough for artistic purposes. Consider a song like, “Où Est Passée L'ombre,” composed by Benoît Carré using Flow Composer. The song, whose name translates to “where is The Shadow,” is haunting not just because of the minor chords or the slow, sombre tempo, but also because it’s sung by the ethereal voice of Alys, a French-Japanese virtual singer and performer. The synthetic, computerized quality of her voice — or rather, its voice — makes the piece more creepy by reflecting the inhuman way it was written. Or perhaps the real reason the song made me shiver is because Alys sounds exactly like GLaDOS, the funny and scary AI antagonist of the game Portal, who has killed me more than a few number of times. I’m sure the allusion is accidental, and that my reaction says more about my personal interpretation of the song than about its objective qualities, but even so, science fiction has created plenty of compelling AI characters, such as HAl 9000 or Samantha from Her. Just as these fictional characters have real flesh and blood actors behind them, real programmers and music artists could portray compelling digital personas by tinkering with the lyrics generation to produce that output. (Or by cheating and writing lyrics the normal way.) Of course, a computer generated song doesn’t have to pretend to be written by a conscious AI, it could pretend to be written by a real human instead, but that it is currently more difficult for a program to appear humanlike than to appear computerlike.
Perhaps in the future, the whole song — the melody, the background, the singing voice, the lyrics, and even the character of the singer themselves — can all be produced with no human input whatsoever. Regardless, I think the most useful musical applications will still be those that still have a glimmer of genuine humanity, those which given a little human feeling can output a lot of song. The computer music field started with extremely deterministic and rigid programs that could only compose for a specific genre, and has moved onto fuzzy, probabilistic, generative algorithms that can sing and speak with various mood and in various styles. The development of these music AIs is very promising, and I am eagerly anticipating what this field will deliver in the years to come.
1. There’s a tendency to dismiss the work these programs do and focus only only on the human input they still require, so here’s a back-of-the-envelope calculation of the work required to write a song. A midi file of “Dangerous Woman” has the size of 8kb, or 8000 bytes after losslessly compressing to a zip file. Thus, we can estimate the Shannon entropy of piece of sheet music for “Dangerous Woman” to be 64,000 bits, as there are 8 bits in a byte. In other words, writing a pop song requires the same amount of work as answering 64,000 yes or no questions, since one bit corresponds to a single manichean, black and white decision. (Making a decision with 4 options makes 2 bits, 8 options makes 3 bits, n options makes log2(n) bits) Suppose you use a program to make a song and select one out of 1000 genres - that’s about 10 bits of human decisionmaking already. If on top of that for each 5 seconds you generate 16 versions and choose the best version, then you putting 4 bits of information into your song for every 5 seconds. For a 4 minute song, that’s 192 more bits, for a total of 202 bits. Lets round that up to 300 bits of human decisionmaking to account for your decision to use the program in the first place, and other human decisions not accounted for. In such a case, you contribute 300 bits out of the 64000 required. Let’s round the entropy down to 60,000 on the basis that ZIP’s compression isn’t perfect. You still only make 0.5% of the creative decisions,and the other 99.5% is done by the program. Even if my math is so terrible that I’m 100x too low, you’re still only doing half the work. This is of course, assuming you are only curating. If you have to write lyrics or play piano for 20 seconds then that creates more bits. The important thing to keep in mind is that creation is 100x harder than curation.
Benoît Carré. “Où Est Passée L'ombre.”
Marshall, Alex. “From Jingles to Pop Hits, A.I. Is Music to Some Ears.” The New York Times, 22 Jan. 2017, Accessed 4 June 2017.
Nunzio, Alex Di. “Illiac Suite.” Musicainformatica, 4 Dec. 2011. Accessed 4 June 2017.
Oord, Aaron van den, et al. “WaveNet: A Generative Model for Raw Audio.” 19 Sept. 2016. Accessed 4 June 2017.
Pachet, F. and Roy, P. “Imitative Leadsheet Generation with User Constraints.” 21st European Conference on Artificial Intelligence (ECAI 2014), pages 1077-1078, Prague (Czech Republic), August 2014
“Watch IBM's Watson Beat AI Make Original Music with Brooklyn R&B Band Phony PPL.” Digital Trends, 22 Mar. 2017. Accessed 4 June 2017