Music-related Data Sets

(a very small portion of what is available, please contribute)

  • ENST Drum dataset: a large and varied research database for automatic drum transcription and processing:
    • Three professional drummers specialized in different music genres were recorded.
    • Total duration of audio material recorded per drummer is around 75 minutes.
    • Each drummer played his own drum kit.
    • Each sequence used either sticks, rods, brushes or mallets to increase the diversity of drum sounds.
    • The drum kits themselves are varied, ranging from a small, portable, kit with two toms and 2 cymbals, suitable for jazz and latin music ; to a larger rock drum set with 4 toms and 5 cymbals.
    Each sequence is recorded on 8 individual audio channels, is filmed from two angles, and is fully annotated.
  • Instrument labels in polyphonic music signals. Audio excerpts from more than 2000 distinct recordings. Predominant instruments: 11 pitched instruments + presence and absence of drumkits. 2500 audio excerpts were accumulated, all lasting between 5 and 30 seconds. In total, we accumulate more than 1100 excerpts per category, i.e. Drums and no-Drums.
  • DREANSS: DRum Event ANnotations for Source Separation. Annotations for 22 excerpts of songs taken from different multi-track audio datasets publicly available for research purposes. These multi-track excerpts range from several genres including Rock, Reggae, electronic, Indie and Metal. The excerpts have a duration of 10 seconds each in average.

  • MTG-QBH: 118 audio files of sung melodies for query by humming.
  • TONAS: a dataset of flamenco a cappella sung melodies with corresponding manual transcriptions

Chords and key
  • Chord and Harmony Music Analysis. The first five albums by Robbie Williams are manually annotated by experts. The annotations include chords and keys.
  • Database of piano chords from the book by Barbancho et al. 2013 . Download link.
  • McGill Billboard annotations . Detailed chord labels and structural annotations for a random sample of American pop music between 1958 and 1991, all time-aligned with audio. Version 2.0 contains all of the annotations from the original release plus the additional set used to evaluate audio chord estimation for MIREX 2012. Although original audio files are not distributed due to copyright restrictions, there are non-invertible audio features upon request. Additionally, train-test submissions for audio chord estimation in MIREX 2013 will be able to train on the raw audio.

  • Tempo annotations from the ISMIR 2004 contest. Total number of instances: 698. Duration: ~30 s. Total duration: ~20940 s. 8 ballroom genres.
  • Beat/bar annotations of the ballroom dataset (same as above). Total number of instances: 698. Duration: ~30 s. Total duration: ~20940 s. 8 ballroom genres.

Structure (contributions mainly by Jordan Smith on the mir community list)
  • SALAMI has over 700 publicly available annotations. (The other half is private and used in MIREX evaluations.) About 200 of these songs are from the Internet Archive and you can download them for free. (Uri Nieto provided a handy script to download them.) For the rest of the songs, you can use the Echo Nest features to identify them and buy them.
  • Billboard has 890 detailed chord annotations for a sampling of songs on the Billboard Hot 100, but the annotations also contain structural labels of the kind you're after. You'll also need to buy the audio yourself for these.
  • RWC has "chorus section" annotations for much of its databases, which for the popular music database will be what you're after.
  • Isophonics has reference annotations for the entire Beatles catalogue as well as several albums by Queen, Michael Jackson, Carole King and Zweick.(Additional sets of Beatles annotations, by TUT and UPF, can be found here. Like the Isophonics set, these are derived from Alan Pollack's analyses)
  • INRIA has detailed structural annotations of the RWC Pop (and soon, Genre) databases, as well as a selection of Eurovision pop songs and the Quaero dataset. These annotations are provided in a very detailed format that has all the information about repetition that you seek, and a number of pre-defined variations of symbols.
  • I've seen the Mazurka dataset used in structural analysis evaluations, but I'm not sure if this data is available anywhere.
  • Ewald Peiszer has annotations for a selection of pop songs:
  • Jordan Smith made some annotations (using the SALAMI format, approximately) of some classical and jazz recordings, all from the Internet Archive and hence with matching audio.
  • Multitrack Audio with Structural Segmentation Ground Truth Annotations A dataset containing structural segmentation annotations for 104 rock and pop songs, along with the corresponding multitrack audio files (for 51 of those songs) and hyperlinks to commercially available multitrack audio files (for the remaining 53 songs).

Semantic descriptors/automatic classification
  • Artist lists (partly with genre annotations, text weight features, etc.):
  • GTZAN Genre collection.The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.
  • Latin Music Dataset:The Latin Music Database contains 3.160 music pieces in MP3 Format classified in 10 diferent musical genres. The musical genres used are: Tango, Bolero, Batchata, Salsa, Merengue, Axé, Forró, Sertaneja, Gaúcha and Pagode.
  • Music/Speech. The dataset consists of 120 tracks, each 30 seconds long. Each class (music/speech) has 60 examples. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.
  • CAL-500 dataset of 1700 human-generated musical annotations that describe 500 popular western musical tracks.
  • Music audio benchmark dataset: collection of audio files for Machine Learning and Data Mining which has been downloaded from garage band. The Dataset contains 1886 songs all being encoded in mp3 format. The frequency and bitrate of these files are 44,100 Hz and 128 kb
  • Million song dataset: freely-available collection of audio features and metadata for a million contemporary popular music tracks.
  • Free Music Archive (FMA): audio under Creative Commons from 100k songs (343 days, 1TiB) with a hierarchy of 161 genres, metadata, user data (play counts, favorites, comments), free-form text (description, biography, tags).

Listening Data / CF:

Different descriptions

Transcription, Alignment or (Score-)Informed Source Separation
  • MAPS , large database of piano sounds, single notes, chords and excerpts, both synthesized and recorded from a player piano.
  • JGDB , set of randomized MIDI excerpts covering all MIDI instruments, synthesized with 2 different synths. Also contains versions with tempo changes to test score-to-audio or audio-to-audio alignment.
  • TRIOS , real recordings of trio music covering several instruments. Each instrument is recorded separately and manually aligned to a corresponding MIDI file.

Track popularity
  • Track Popularity Dataset, includes different sources of popularity definition ranging from 2004 to 2014, a mapping between different track/ author/ album identification spaces that allows use of all different sources, information on the remaining, non popular, tracks of an album with a popular track, contextual similarity between tracks and ready for MIR use extracted features for both popular and non-popular audio tracks.

See also alternative/complementary lists by Alexander Lerch hereand by Alexander Schindler here.