Unsupervised Transcription of Piano Music
MS Technical Paper
Fei Xiang
Mar.14, 2015
1. Motivation
Audio signal processing has been a very active research area. Automatic piano music transcription, of all the tasks in this area, is an especially interesting and challenging one. There are many examples of how this technique can contribute to our life. For instance, in today’s music lessons and tests, we often rely on people’s hearing ability to judge whether a piano player performed well based on whether the notes played are accurate or not. The process requires man-power and is not always fair and accurate because people’s judgement is subjective. If a good automatic transcription system can be designed and implemented with high
…show more content…
To tackle this problem, source-separation techniques must be utilized.
2. Existing Approaches
In this section, we will discuss what has been done in this area of unsupervised music transcription. Undoubtedly there are different aspects to this task. And different ways and techniques are used in attempt to solve this problem efficiently and accurately. In an effort to provide a clear picture of what has been done, we will categorize different approaches based on technique used.
The classic starting point for the problem of unsupervised piano transcription where the test instrument is not seen during training, is a non-negative factorization of the acoustic signal’s spectrogram [1]. Most research work has been improving on this baseline in the one of the following two ways: better modeling of the discrete musical structure of the piece being transcribed [2,3] or by better adapting to the timbral properties of the source instrument [4,5].
Combining the above two approaches are difficult. Hidden Markov or semi-Markov models are widely used as the standard approach to model discrete musical structures. This approach needs fast dynamic programming for inference. To combine discrete models with timbral adaption and source separation, it would break the conditional independence assumptions that dynamic programming rely on. Previous research work to avoid this inference problem typically postpones detailed modeling the discrete structure of timbre
Music has been around for many centuries evolving new types of sound every single day. Music has come a long way; listening to music helps us stay focus, helps us relieve from stress, and just to preoccupied ourselves. In present day, they are many different variety of music ranging from folk to hip hop. The two songs “God’s Plan” by Drake Graham and “Shepherd's hey” by Percy Grainger are two songs that have a significant amount of differences. “Shepherd’s hey” is an original band composed song while “God’s plan” is hip hop. Showing differences in their dynamics, rhythm, and timbre these songs are relatively different from each other.
Using sampling in the music industry is become more popular in the age of electronic music and disco in the mid-70s to the early 80s. Because of the improvement of digital sampling technology, recently, the sampling computer software has been introduced and replace the traditional sampling methods. The incorporation of sampling tools in the
While many forms of writing are absolutely difficult to translate, as shown by Douglas Hofstadter’s analysis of a French poem by Clément Marot in “Le Ton Beau de Marot,” song lyrics would arguably require the most effort. In addition to the basic semantic translation required of any text in order to preserve meaning, translated songs would need to maintain a similar “rhythm” - in both a musical and a more syntactic sense - as the original in order to convey a similar tone. This task means that many songs could be considered somewhat “untranslatable,” since in many cases, such prosody could not be perfectly recaptured in a translation.
The following table demonstrates the interpreted analysis of the musical structure. It is interesting to note that there appears to be many microstructures in the form of instrumental transitions from a section to another.
In the model of Jeffress, this is detected by the input from one ear being compared with multiple time-shifted versions of the input from the other ear. The time shift that produces the maximum degree of similarity indicates the direction of the sound source in the horizontal plane. In the formulation of Jeffress, the time shifts are produced by axons of different lengths. The output neurons detect coincidence in the time-shifted inputs from the two
To my amazement the way people play and learn the piano has changed. For example there is such a thing called Player Pianos or “Reproducing Pianos” that can play music by itself and that can sound so clear it might as well be live music. They have also come up with ways
Based on the high market size of record labels and the necessity to forecast whether a song can become a hit via Hit Song Science due to high releasing fees, polyphonic should pursue
This section also highlights the Auditory Perception theory of Hermann Helmholtz, where he connected the science of hearing with the aesthetics of music. His other theory of Upper Partials or Overtones (p. 64) is still relevant today, especially in telephony and even heavy-metal rock music. Sounds are made up of a fundamental (lower partial) note and a series of harmonics (overtones) that together create its character or timbre. So, while telephones are unable to reproduce a full frequency spectrum of sounds, we are still able to recognise the callers voice because our brain converts the overtones into the voice that we know. The same principle is used in producing ‘power chords’ on the electric guitar;
The paper illustrates the differences among individuals in the Frequency-following response (FFR) and its connection with pitch perception. Generally, the FFR is presumed to introduce the pitch of a particular sound perceived by individuals. FFR varies among individuals and the reason of this variability is still unknown. However, the authors of this paper carried out the investigation to find out the relation between FFR representation of the frequency of a complex tune and perception of the pitch of the fundamental frequency.
Music transcends cultures, nations, and generations within the human culture and its importance has withstood the test of time. Recording music and storing it began as tribal songs passed down by elders, evolving into written lyrics, then to electronically and magnetically stored sound. Today, we all buy and listen to music digitally online. With the importance of music in culture around the world, in this discussion board I will examine the development of technologies the record player uses to record and store sounds of the ages.
over his or her recording (s). Your selected artist must be chosen from the textbook or
The algorithm is robust than MUSIC algorithm. Complexity and storage requirements are lower than for MUSIC algorithm. It explorers the rotational invariance property in the signal subspace created by two subarrays derived from original array with a translation invariance structure.[2]
Over the decades, music has evolved; from the classics to today’s contemporary music. In the 1960’s it was all about The Beatles, 1970’s was about Pink Floyd, 1980’s was about the king of pop: Michael Jackson, Nirvana in the 1990’s, and Green Day in the 2000’s. However today’s artists are completely different from the previous generations in terms of composition and quality.
On October 30th, we had our sixth presentation of this course. We are getting close to the end of the course with two lectures left. The lecturer, this time, was Goucher’s computer music professor Samuel Burt, with Lisa Weiss helping him. According to his bio, among his title as a professor, Samuel Burt is a composer. He had a Bachelors of Music from the University of Georgia. Then, he received a Master of Music from Peabody Institute for composition and computer music. He is a member of the High Zero Foundation and has participated in the High Zero Festival. This foundation is a way of showcasing experimental and improvised music in Baltimore. He participated and performed in the Red Room series and the Worlds in Collusion at Artscape. To add to his repertoire, he builds and sells daxophones, which are wooden friction-based instruments. He also creates/designs his own electronic music software that we had the chance to experience for ourselves in his lecture.
The increasing popularity of web songs has helped to improve the music business. In contrast, the previous mode has been laggard because of the monopolization by a few major elite music companies (Zhang, 2006). Every music fan is currently fascinated with the dissemination of musical industry digital interface (MIDI) in producing his or her own music. In addition, if we say the MP3 player represents a revolution in terms of music dissemination, in that way, MIDI can be considered to be the delegate of a music revolution in music production (Kennedy, 2006). In the past, producing a song needs to be helped by other musicians and musical instruments costing at least 10,000 Yuan in China. Nowadays, depend on the advances of technology, the computer can uses musical industry digital interface to simulate all musical instruments almost at around 300 to 500 Yuan (Zhang, 2006), thus, a huge reducing cost in producing music.