MusicXML in Practice: Issues in Translation and Analysis

Michael Good

Recordare LLC
PO Box 3459
Los Altos, California 94024 USA

Originally published in Proceedings First International Conference MAX 2002: Musical Application Using XML (Milan, September 19-20, 2002), pp. 47-54.

Copyright © 2002 Michael Good. All rights reserved.

ABSTRACT

Since its introduction in 2000, MusicXML has become the most quickly adopted symbolic music interchange format since MIDI, with support by market and technology leaders in both music notation and music scanning. This paper introduces the key design concepts behind MusicXML, discusses some of the translation issues that have emerged in current commercial applications, and introduces the use of MusicXML together with XML Query for music analysis and information retrieval applications.

Keywords

MusicXML, music notation, music information retrieval, music analysis, XQuery, Finale, Dolet.

1 INTRODUCTION TO MUSICXML

Many people have recognized the possible benefits of using XML for music representation [1, 2, 7]. Similarly, many efforts have been made over the years to come up with a higher-level interchange format for music notation than is provided by Standard MIDI Files [12, 16]. MusicXML combines these two trends, using XML technology to develop a new interchange standard for music notation.

MusicXML has been developed from a commercial perspective as opposed to a research perspective. Recordare’s business model is founded on the commercial opportunities enabled by a standardized, Internet-friendly format for symbolic musical data. We needed to quickly demonstrate the viability of an interchange standard in the context of commercial musical applications. Our major technical risk was that we would develop yet another interchange format that could not or would not be adopted by the market leaders in music application software.

MusicXML Design Techniques

Previous efforts at music interchange standards have tended to fail in one of two ways. In the case of SMDL [14], the design goal was overly ambitious, leading to an overly complex language with very few implementations. In the case of NIFF [6], the design goal was insufficiently general. The graphics-oriented focus of the language made it adequate for scanners and notation programs with graphics-oriented formats. Sequencers, databases, and notation programs with more underlying awareness of musical semantics were ill-served by the graphical focus. A critical mass of application support never developed.

From our industrial perspective, we looked to MIDI and HTML as exemplars for developing a new music interchange standard. MIDI and HTML are both powerful enough to solve a good variety of industrial-strength problems. On the other hand, they are simple enough to learn and implement so that people can learn the basics easily, smoothly adding additional features over time.

XML 1.0 shares these dual characteristics of power and simplicity. XML’s widespread adoption throughout the information technology industry allows music software to make use of the technology investments made by much larger industries.

Several techniques were used to develop a usable, useful, powerful interchange standard:

  • The initial design of MusicXML was based on two of the most powerful music representation formats from academic research: MuseData [9] and Humdrum [10].
  • The design and implementation of MusicXML was done iteratively, using an evolutionary delivery approach [4]. This ensured that the design elements were both implementable and useful across applications.
  • Initial implementations focus on the market and technical leaders for major music applications, such as Finale and SharpEye Music Reader. By working with the most fully-featured products in their market areas, we could be confident that we were creating an industrial strength interchange language vs. a research prototype.

MusicXML Application Support

MusicXML’s success is apparent from its quick adoption compared to any symbolic music interchange format since MIDI. Figure 1 shows the software products and projects supporting MusicXML as of July 2002 [11].


MusicXML Support Map as of July 2002Figure 1: MusicXML Support as of July 2002

Finale 2003, SharpEye Music Reader, and TaBazar are all shipping with MusicXML support on Windows. Recordare’s Dolet software supports MuseData and Finale 2000 to 2003. Finale 2003 can import SCORE files, allowing a two-step conversion into MusicXML. MusicXML files imported into Finale can be printed to the FreeHand MusicPad Pro electronic music stand, providing a new electronic format for MusicXML scores.

Project XEMO has demonstrated a Java-based MusicXML Notation Viewer, currently in alpha test, running on Windows, Macintosh OS X, Linux, and Solaris systems. Middle C Software has announced support for MusicXML in their future scanning software products. NoteHeads has announced plans to import MusicXML files in version 1.7 of their Igor Engraver notation product. The KGuitar open-source guitarist environment for Linux, FreeBSD, and Solaris systems added MusicXML support in version 0.4.1. The NIFF and MIDI converters have been developed as internal prototypes at Recordare.

MusicXML’s successful adoption contrasts not only with past interchange languages, but also with other XML formats for symbolic music representation proposed in the past several years. Most of these formats cannot represent the full range of music possible in MusicXML. None has any commercial product support as of July 2002, much less support from music software industry leaders.

2 MUSICXML DESIGN ISSUES

MusicXML follows MuseData and other formats in separating underlying musical representation from the specifics of a particular engraving or music performance. As with MuseData, the three domains are combined within a single format. The “logical domain” of music is found in MusicXML’s elements, while details of the visual and performance domain are found in MusicXML’s attributes. There are also dedicated elements for <print> and <sound> where attributes associated with logical domain elements were not sufficient.

The integration of the three domains into a single format speaks to the need to cover an adequate range of music applications in a single notation format. The distinction between elements and attributes facilitates the segmentation of domains both for learning MusicXML and building applications. Distinctions between domains tend to be cleaner in theory than in practice. Given MusicXML’s commercial focus, it made sense not to be overly rigorous about these theoretical distinctions.

To introduce how MusicXML represents musical scores, here is the musical equivalent of C’s “hello, world” program for MusicXML. Here we will create about the simplest music file we can make: one instrument, one measure, and one note, a whole note on middle C:


Whole note middle C, treble clef, 4/4Figure 2: A Musical “Hello, World”

Here is the musical score represented in MusicXML:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC 
  "-//Recordare//DTD MusicXML 0.6b Partwise//EN"
  "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise>
  <part-list>
    <score-part id="P1">
      <part-name>Music</part-name>
    </score-part>
  </part-list>
  <part id="P1">
    <measure number="1">
      <attributes>
        <divisions>1</divisions>
        <key>
          <fifths>0</fifths>
        </key>
        <time>
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <clef>
          <sign>G</sign>
          <line>2</line>
        </clef>
      </attributes>
      <note>
        <pitch>
          <step>C</step>
          <octave>4</octave>
        </pitch>
        <duration>4</duration>
        <type>whole</type>
      </note>
    </measure>
  </part>
</score-partwise>

For scores of this simplicity, MusicXML’s design roots are clearly apparent. This is basically an XML version of the MuseData representation.

Several of MusicXML design elements, including the interchangeability between partwise and timewise formats, have been described previously [5]. Here we will focus on some additional design aspects that have proven to be important for music translation, and that look to be important for future work in musical analysis.

One key design choice is that each aspect of music semantics is represented in a different element. This provides the greatest flexibility for diverse music applications, especially once music information retrieval is included in the application mix. Our example analysis programs below will demonstrate some of the benefits of this design choice.

Another key design element carried over from MuseData is the importance of separately representing what is heard vs. what is notated [13]. Take the issue of note duration. MusicXML follows MIDI and MuseData by putting the denominator of music duration, the number of divisions per quarter note, in a separate, usually unchanging <divisions> element. The whole note is represented both in sound, as a <duration> of 4 divisions, and as a graphical <type> of a whole note. It is useful to have both, since notation programs work more easily with the graphical type, while sequencers work more easily with the duration values. In other cases, such as jazz, sounding duration is different than written duration, so both elements are required for an adequate representation of both musical sound and a musical score.

This type of dual representation of sound and graphics, so crucial to support diverse industrial applications, contrasts with the graphical representations used in NIFF and the WEDELMUSIC XML format [1]. NIFF is a binary format, but if we translate its binary elements directly into an XML document, our middle C whole note would look something like this:

<Notehead Code="note" Shape="2" StaffStep="-2">
   <Duration Numerator="1" Denominator="1"/>
</Notehead>

The StaffStep attribute tells us that the note is two staff steps, or one line, below the staff. But what is its sounding pitch? To determine that, we need to check the clef and key signature, handle any accidentals that preceded the note in this measure, look for any accidentals in a note that may be tied to this one, and interpret any 8va markings. This is a lot of computation for one of the two most basic elements of music notation: what pitch is sounding? Fortunately, the other basic element, the timing of the note, is represented much more directly.

The very indirect nature of pitch representation makes NIFF and other graphical formats unusable for most performance and analysis applications. It even makes for problems in its intended use in the visual transfer between scanning and notation applications. The NIFF importer included in Sibelius 2.11 has bugs that are directly correlated to missing one or more of the multitude of steps needed to accurately determine musical pitch from NIFF data. Graphical formats have a long history in music representation, and are appropriate as internal formats for many applications, but have severe problems when used as the foundation of a general music interchange format.

3 MUSICXML TRANSLATION ISSUES

When we first started building programs to move between MusicXML and other music formats, we called them converters. Conversion implies the centrality of the change from one format to another. We have since realized that a more productive metaphor might be that of translation; the interpretation of one form of human expression into another [15]. Our first software translation products are named after the 16th-century French translator Etienne Dolet.

At Recordare we have produced four MusicXML translators to date: two-way translators for Finale and MuseData, and one-way translators from NIFF and to Standard MIDI Files. Each translation brought up different issues of interpretation that need to be successfully addressed to make an effective interchange format.

MuseData Translation

Our MuseData translator was built together with the initial design of MusicXML. The first version of MusicXML was primarily an adaptation of the MuseData format into XML form, with the addition of a timewise format to simulate Humdrum’s two-dimensional lattice structure within a hierarchical language.

Our design decision was to make MusicXML a superset of MuseData, so that we could do a 100% conversion (as we thought of it then) from MuseData into MusicXML and then back again. We adopted features in entirety even when we were unclear of their utility for a general-purpose translation language. Since the initial design, we have removed some of the documented features that are not used in practice, and included some undocumented features that are indeed used in MuseData files available from CCARH.

Given that MusicXML covers a superset of MuseData features, we encountered no major translation difficulties with this first piece of software. This gave us confidence that the XML language was indeed strong enough to serve as a basis for music representation without hidden problems that would only be revealed through implementation experience. The translation issues have emerged later, with more programs translating back and forth to MusicXML. These programs may make use of features that are present in MusicXML but not in MuseData, or may be used to translate classical repertoire from later eras than MuseData’s design focus. For instance, translating pieces by Chopin, Mahler, and others with multiple large tuplets causes problems when the number of divisions needed for precise durations leads to notes with durations that cannot fit into a 3-character MuseData field.

NIFF and MIDI Translations

We next added translators for two binary formats: from NIFF to MusicXML and from MusicXML to Standard MIDI Files. By testing translation with NIFF’s highly graphical format and MIDI’s performance-only format, we could determine whether MusicXML really did have the scope to handle a variety of music formats far afield from our MuseData and Humdrum starting points.

Building the NIFF translator is what gave us our more detailed understanding of the problems with highly graphical formats for music interchange. While our prototype works fine for testing purposes, it was clear that building an industrial-strength NIFF translator, while possible, would take a great deal of time and effort. We decided instead to persuade the author of the most commercially important application writing NIFF files (SharpEye Music Reader) to write MusicXML files as well. SharpEye was the first product to support the MusicXML format, and their implementation experience demonstrated that MusicXML could indeed be implemented successfully by third party developers.

Given the existence of MuseData to MIDI translators, we were not surprised when the MusicXML to MIDI translator posed no major challenges. An interesting aspect of both these translators is our use of XML as an intermediate format for both the NIFF and MIDI files. This creates an easier-to-program structure for these binary formats.

Finale Translation

Translating to and from Finale posed the largest challenges for MusicXML to date. As a fully-featured industrial application, it poses the expected challenges of dealing with a program whose feature set exceeds that of the interchange format. In many cases we added features that were necessary to support effective import from SharpEye to Finale (for instance, system and page breaks), but others are still unsupported.

The more interesting issues come from the differences in structure between Finale and MusicXML files. In many of the fundamentals, there is a great deal of similarity. Finale’s underlying frame structure (a single measure on a single staff) has up to four layers, and each layer can have one or two voices. The layers and voices are similar to MusicXML’s <voice> element. Moving between layers and voices are handled in Finale by means very similar to the <backup> and <forward> elements in MusicXML.

When we get to articulations and expressions, things work very differently in Finale and MusicXML. Finale is designed to be open-ended and extensible, so there are few of the bult-in abstractions present in MusicXML. These abstractions must instead be inferred from the definition of a musical symbol in the Finale database.

As an example, what Finale structure should translate to a <staccato> element in MusicXML? Is it:

  • An articulation whose font character looks like a staccato dot, or
  • An articulation with a performance definition that shortens the note length?

In practice we have found that the font definition works more reliably. Finale is a notation program, and people generally pay much more attention to appearance than playback within Finale files. But using this definition limits your translation to fonts that you have seen before, so that you know what musical glyphs are associated with each code in a music font.

Currently, the major barrier to even better Finale/MusicXML translations is the incomplete documentation for the Finale format provided in the current Finale 2000 plug-in developer’s kit. This is akin to a translator working just by the context of word usage, with only an incomplete set of dictionaries for reference. Coda has suggested that this documentation may be improved in a future version of the developer’s kit.

4 MUSICXML AND MUSICAL ANALYSIS

Music analysis and retrieval using large datasets of symbolic musical data has been hampered by the lack of an adequate, standardized format for symbolic music representation supported by commercial software tools. This gap makes it difficult to acquire and reuse either musical data or musical tools. The tools that are developed for music analysis research do not have the technical underpinnings to scale up to large-scale commercial usage of music information retrieval. The need to use databases to build collections of symbolic music information is well understood [7], but the technology has been lacking.

Building scalable database systems is a costly undertaking. It makes more sense for music applications to leverage the investment of other, better-funded application areas such as electronic commerce, as long as that technology is adequate—not necessarily ideal—for the needs of musical applications

XML has the potential to finally break through the database barrier through the efforts of the World Wide Web Consortium’s XML Query working group. The group’s mission is “to provide flexible query facilities to extract data from real and virtual documents on the Web, therefore finally providing the needed interaction between the web world and the database world. Ultimately, collections of XML files will be accessed like databases.” [18]

The current focus of the XML Query Working Group is the XQuery 1.0 language. Though this language is still a work-in-progress, available only in working draft form, there are already a dozen prototype implementations available for evaluation. These come both from major relational database vendors like Oracle and Microsoft as well as native XML database vendors like Software AG.

The combination of an XML language for music and an XML query language is not sufficient by itself to break through the database barrier for music information retrieval. The two languages must be able to work together to solve musical problems. Early XQuery working drafts had significant problems in this area, lacking powerful facilities to deal with queries that combine aspects of sequence and hierarchy. These shortcomings have been addressed in the XQuery 1.0 working draft of April 30, 2002, and we have now been able to build our first interesting musical queries using XQuery and MusicXML.

Given XQuery’s importance and scope, it is likely to be some time yet before the language definition is completed, issued as a W3C recommendation, and commercial tools made available for effective development of XQuery applications. Fortunately, for research purposes, many analysis applications can be developed effectively today with existing tools: the XML Document Object Model (DOM) [17] and the XML Path Language 1.0 (XPath) [3].

Musical analysis is not just applicable in musicological research; it can also be useful in music publishing. For instance, as Recordare publishes its editions of classical art songs, it is helpful to show the range of each song. This process can be automated by a musical analysis program working on the MusicXML data. Figure 3 shows a screen shot from a program that generates a distribution graph of the pitch range for any particular part in a piece of music. Here we are computing the range for the voice part of the last song in Schumann’s Frauenliebe und Leben, Op. 42.

Screen shot of pitch distribution analysis programFigure 3: Pitch Range Distribution Analysis Program

Figure 4 shows the synopsis produced by clicking on the “Report” button. It focuses on the low and high notes.

Screen shot of pitch range synoosis reportFigure 4: Pitch Range Synopsis Report

The program that generates this synopsis report is easy to write in MusicXML. For comparison, we will show two implementations. The first uses the DOM, programmed in Visual Basic 6.0 with Microsoft’s MSXML3 parser. An equivalent program can be built using XQuery. Our example uses the QuiP 2.1.1 prototype program from Software AG, which is based on the April 30 working draft of XQuery 1.0. QuiP and XQuery are both works in progress, so the syntax of a working program is likely to change by the time XQuery becomes a formal recommendation from the World Wide Web Consortium.

DOM Approach

The DOM approach is implemented within a function that takes a MusicXML document and MusicXML part ID as input, and returns the dialog box string as output. After the initial variable declaration and initialization, the variable oNodes is assigned to all the <pitch> elements within the <part> specified by the PartID parameter. The selection is made using XPath 1.0 syntax.

The program then loops through each pitch, calling the MIDINote function to compute the MIDI note value from the different components of the <pitch> element. If the resulting pitch is lower or higher than any seen before, the spelling of the note is saved in a variable, using a separate SpellNote function on the same <pitch> element. The measure containing the extreme pitch also saved.

After all the pitches are searched, the program returns a string composed from the saved values for the lowest and highest MIDI pitches, along with their musical spellings and the measure where they were first encountered.

Function FindRange _
(ThisXML As DOMDocument30, _
 ByVal PartID As String)

Dim oRoot As IXMLDOMElement ' Root of XML document
Dim oNodes As IXMLDOMNodeList ' Pitches to analyze
Dim oElement As IXMLDOMElement ' Current pitch
Dim oMeasure As IXMLDOMElement ' Parent measure
Dim lPitch As Long        ' Current pitch
Dim lMinPitch As Long     ' Lowest MIDI pitch
Dim sMinPitch As String   ' Spelling of low pitch
Dim lMaxPitch As Long     ' Highest MIDI pitch
Dim sMaxPitch As String   ' Spelling of high pitch
Dim sMinMeasure As String ' Measure for low pitch
Dim sMaxMeasure As String ' Measure for high pitch

lMinPitch = 128
lMaxPitch = -1

Set oRoot = moXML.documentElement
Set oNodes = _
oRoot.selectNodes( _
       "//part[@id='" & PartID & "']//pitch")

' Search each pitch for the lowest and highest
' values, saving the spelling and measure number.
Do
  Set oElement = oNodes.nextNode
  If oElement Is Nothing Then Exit Do
  lPitch = MIDINote(oElement)
  If lPitch < lMinPitch Then
    lMinPitch = lPitch
    sMinPitch = SpellNote(oElement)
    Set oMeasure = _
      oElement.selectSingleNode _
       ("ancestor::measure")
    sMinMeasure = _
      oMeasure.getAttribute("number")
  End If
  If lPitch > lMaxPitch Then
    lMaxPitch = lPitch
    sMaxPitch = SpellNote(oElement)
    Set oMeasure = _
      oElement.selectSingleNode _
       ("ancestor::measure")
    sMaxMeasure = _
      oMeasure.getAttribute("number")
  End If
Loop

FindRange = "Lowest note is " & sMinPitch & _
  " (MIDI " & lMinPitch & _
  ") in measure " & sMinMeasure & vbCrLf & _
"Highest note is " & sMaxPitch & _
" (MIDI " & lMaxPitch & _
  ") in measure " & sMaxMeasure

End Function

The MIDINote function builds the MIDI note number by reading the <octave>, <step>, and <alter> elements in turn to build the note number value. The CLng function called here casts the string returned by the XML Element into a 32-bit integer (the Long type in Visual Basic 6.0).

' Return MIDI note value from a MusicXML pitch
' element, ignoring microtones.

Function MIDINote _
  (ThisPitch As IXMLDOMElement) As Long
  Dim oElement As MSXML2.IXMLDOMElement
  Dim lTemp As Long       ' Temporary pitch

  ' Get octave
Set oElement = _
  ThisPitch.selectSingleNode("octave")
  lTemp = 12 * (CLng(oElement.Text) + 1)
  ' Get pitch step
Set oElement = _
  ThisPitch.selectSingleNode("step")
  Select Case oElement.Text
    Case "a", "A": lTemp = lTemp + 9
    Case "b", "B": lTemp = lTemp + 11
    Case "c", "C": lTemp = lTemp + 0
    Case "d", "D": lTemp = lTemp + 2
    Case "e", "E": lTemp = lTemp + 4
    Case "f", "F": lTemp = lTemp + 5
    Case "g", "G": lTemp = lTemp + 7
  End Select
  ' Get alteration if any
Set oElement = _
  ThisPitch.selectSingleNode("alter")
  If Not oElement Is Nothing Then
    lTemp = lTemp + CLng(oElement.Text)
  End If

  ' Assign and exit
  MIDINote = lTemp

End Function

The SpellNote function is even more straightforward, as the only conversion that needs to be done is to go from the numeric <alter> value to a text symbol for the sharps and flats in the note spelling.

' Spell the pitch as a string, e.g. "C#4"

Function SpellNote _
  (ThisPitch As IXMLDOMElement) As String

  Dim oElement As IXMLDOMElement 
  Dim sSpell As String  ' Temporary string
  Dim sAlter As String  ' Alteration string

  ' Get pitch step
  Set oElement = _
    ThisPitch.selectSingleNode("step")
    sSpell = UCase$(oElement.Text)

  ' Get alteration if any
  Set oElement = _
    ThisPitch.selectSingleNode("alter")
  If Not oElement Is Nothing Then
    Select Case CLng(oElement.Text)
      Case -2: sAlter = "bb"
      Case -1: sAlter = "b"
      Case 0: sAlter = vbNullString
      Case 1: sAlter = "#"
      Case 2: sAlter = "##"
      Case Else
        sAlter = "(" & oElement.Text & ")"
    End Select
    sSpell = sSpell & sAlter
  End If

  ' Get octave
  Set oElement = _
    ThisPitch.selectSingleNode("octave")
  sSpell = sSpell & oElement.Text

  ' Assign and exit
  SpellNote = sSpell

End Function

XQuery Approach

Our XQuery implementation follows a similar approach to the DOM implementation. Since QuiP is a standalone prototype tool for learning XQuery, we have hardcoded the file name and part ID that were parameterized in the DOM example. This example takes a very simple approach to the query, reviewing all the pitches twice in order to locate the minimum and maximum values. Once we have these values, we then find the pitch elements whose MIDI note values match the high and low values. XQuery results are returned in XML format, so we do not need a SpellNote function. We simply output the first <pitch> elements that match each of the extreme values, and then find the number of the measure that contains the first instance of these matching elements. XQuery makes use of XPath 2.0 and does not support the ancestor:: axis, so our query assumes the <measure> element is the grandparent of the <pitch> element. Therefore this query will only work with partwise MusicXML files, not timewise files. We have revised the syntax slightly to better match the XQuery working draft, using the string function where QuiP 2.1.1 used the string-value function.

define function MIDINote(element $thispitch) returns integer
{
  let $step := $thispitch/step
  let $alter :=
    if (empty($thispitch/alter)) then 0
else if (string($thispitch/alter) = 
         "1") then 1
else if (string($thispitch/alter) = 
         "-1") then -1
    else 0
let $octave := 
  integer(string($thispitch/octave))
  let $pitchstep :=
    if (string($step) = "C") then 0
    else if (string($step) = "D") then 2
    else if (string($step) = "E") then 4
    else if (string($step) = "F") then 5
    else if (string($step) = "G") then 7
    else if (string($step) = "A") then 9
    else if (string($step) = "B") then 11
    else 0
  return 12 * ($octave + 1) + $pitchstep + $alter
}

let $doc := document("MusicXML/Frauenliebe8.xml")
let $part := $doc//part[./@id = "P1"]
let $highnote := 
max(for $pitch in $part//pitch
    return MIDINote($pitch))
let $lownote := 
min(for $pitch in $part//pitch
    return MIDINote($pitch))

let $highpitch := 
  $part//pitch[MIDINote(.) = $highnote]
let $lowpitch := 
  $part//pitch[MIDINote(.) = $lownote]
let $highmeas :=
  string($highpitch[1]/../../@number)
let $lowmeas := 
  string($lowpitch[1]/../../@number)

return
  <result>
    <low-note>{$lowpitch[1]}
      <measure>{$lowmeas}</measure>
    </low-note>
    <high-note>{$highpitch[1]}
      <measure>{$highmeas}</measure>
    </high-note>
</result>

This query returns the following result in XML:

<?xml version="1.0"?>
<result>
  <low-note>
    <pitch>
       <step>C</step>
       <alter>1</alter>
       <octave>4</octave>
    </pitch>
    <measure>16</measure>
</low-note>
  <high-note>
    <pitch>
       <step>D</step>
       <octave>5</octave>
     </pitch>
     <measure>12</measure>
  </high-note>
</result>

Melody retrieval provides a more typical XQuery example, using a FLWR (for-let-where-return) expression. Here we are looking for the instances of the Frere Jacques theme in the key of C. We simply this query to look just for the pitch step sequence of C, D, E, C. This query also assumes a partwise MusicXML file. It will match instances of the pitch sequence that cross <measure> boundaries, but will not match across <part> boundaries:

<result>
{let $doc := 
    document("MusicXML/frere-jacques.xml")
let $notes := $doc//note
for $note1 in 
        $notes[string(./pitch/step) = "C"],
    $note2 in $notes[. follows $note1][1],
    $note3 in $notes[. follows $note2][1],
    $note4 in $notes[. follows $note3][1]
    let $meas1 := $note1/..
    let $part1 := $meas1/..
    let $part2 := $note2/../..
    let $part3 := $note3/../..
    let $part4 := $note4/../..
where string($note2/pitch/step) = "D"
  and string($note3/pitch/step) = "E"
  and string($note4/pitch/step) = "C"
  and (string($part1/@id) =
       string($part2/@id))
  and (string($part2/@id) = 
       string($part3/@id))
  and (string($part3/@id) =
       string($part4/@id))
return 
  <motif>
        {$note1/pitch} {$note2/pitch}
        {$note3/pitch} {$note4/pitch}
        <measure>{$meas1/@number}</measure>
        <part>{$part1/@id}</part>
      </motif>
}
</result>

When run against a simple three-part round of Frere Jacques prepared in Finale and exported to MusicXML, the query returns six instances of the motif, the first of which is shown below:

<?xml version="1.0"?>
  <result>
    <motif>
      <pitch>
        <step>C</step>
        <octave>5</octave>
      </pitch> 
      <pitch>
        <step>D</step>
        <octave>5</octave>
      </pitch> 
      <pitch>
        <step>E</step>
        <octave>5</octave>
      </pitch> 
      <pitch>
        <step>C</step>
        <octave>5</octave>
      </pitch>
      <measure number="1" />
      <part id="P1" />
    </motif>
<motif>
<!-- Remaining 5 motifs removed 
     for brevity -->
</result>

5 CONCLUSION

MusicXML has built on the collective work of the XML and music representation communities to become the most widely adopted symbolic music interchange format since MIDI. As the language develops, it will encounter further challenges in the areas of translation and analysis. Our commercial experience to date bodes well for handling translation issues as MusicXML expands to include more data for tablature, percussion notation, and sequencer applications. Our recent XQuery experience gives us new hope that industry-standard XML database tools, combined with MusicXML-based representations, will provide powerful new tools for problems in musical data analysis and information retrieval.

ACKNOWLEDGEMENTS

Eleanor Selfridge-Field, Walter B. Hewlett, Barry Vercoe, and David Huron provided valuable advice and encouragement, along with their outstanding prior work in music representation. Graham Jones, Ian Carter, Jane Singer, William Will, and Craig Sapp were especially helpful during the Dolet for Finale beta test.

REFERENCES

  1. Bellini, P. and Nesi, P. WEDELMUSIC format: An XML music notation format for emerging applications. In Proc. First International Conference on WEB Delivering of Music (Florence, November 2001), IEEE, 79-86.
  2. Castan, G., Good, M. and Roland, P. Extensible markup language (XML) for music applications: An introduction. In The Virtual Score: Representation, Retrieval, Restoration, ed. W. B. Hewlett and E. Selfridge-Field (Cambridge, MA, 2001), MIT Press, 95-102.
  3. Clark, J. and DeRose, S., eds. XML Path Language (XPath) Version 1.0. World Wide Web Consortium Recommendation, November 16, 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.
  4. Gilb, T. Principles of Software Engineering Management. Reading, MA: Addison-Wesley, 1988.
  5. Good, M. MusicXML for Notation and Analysis. In The Virtual Score, ed. W. B. Hewlett and E. Selfridge-Field (Cambridge, MA, 2001), MIT Press, 113-124.
  6. Grande, C. The Notation Interchange File Format: A Windows-compliant approach. In Beyond MIDI, ed. E. Selfridge-Field (Cambridge, MA, 1997), MIT Press, 491-512.
  7. Haus, G. and Longari, M. Music information description by mark-up languages within DB-Web applications. In Proc. First International Conference on WEB Delivering of Music (Florence, November 2001), IEEE, 71-78.
  8. Haus, G. and Pollastri, E. An audio front end for query-by-humming systems. In Proc. 2nd Annual International Symposium on Music Information Retrieval (Bloomington, IN, October 2001), 65-72.
  9. Hewlett, W. B. MuseData: Multipurpose representation. In Beyond MIDI, ed. E. Selfridge-Field (Cambridge, MA, 1997), MIT Press, 402-447.
  10. Huron, D. Humdrum and Kern: Selective feature encoding. In Beyond MIDI, ed. E. Selfridge-Field (Cambridge, MA, 1997), MIT Press, 375-401.
  11. MusicXML Software page. http://www.musicxml.org/software.html.
  12. Selfridge-Field, E. Beyond MIDI: The Handbook of Musical Codes. Cambridge, MA: MIT Press, 1997.
  13. Selfridge-Field, E., Hewlett, W. B., and Sapp, C. S. Data models for virtual distribution of musical scores. In Proc. First International Conference on WEB Delivering of Music (Florence, November 2001), IEEE, 62-70.
  14. Sloan, Donald. HyTime and Standard Music Description Language: A document-description approach. In Beyond MIDI, ed. E. Selfridge-Field, (Cambridge, MA, 1997), MIT Press, 469-490.
  15. Steiner, G. After Babel: Aspects of language and translation, 3rd edition (Oxford, 1998), Oxford University Press.
  16. The Complete MIDI 1.0 Detailed Specification. Document version 96.1. Los Angeles, 1997, The MIDI Manufacturers Association.
  17. Wood, L. et al., eds. Document Object Model (DOM) Level 1 Specification, Version 1.0. World Wide Web Consortium recommendation, October 1, 1998. http://www.w3.org/TR/REC-DOM-Level-1/.
  18. World Wide Web Consortium XQuery web site: http://www.w3.org/XML/Query.