The MusicBrainz project by MetaBrainz has released their latest dataset, MusicBrainz Canonical Metadata. This dataset solves a number of problems involving matching music to the correct entry in the massive MusicBrainz database. Previously it has been difficult to programmatically identify the main (canonical) release of an album or song. This dataset solves the problem, for anyone interested in building their own music database, tagger application, or other music-related application.
The MusicBrainz database aims to collect all the metadata for all music that has ever been published. For popular albums and songs, which have been released many times, it can be hard to answer the question “which one is the main (canonical) entry?” Using the new dataset, a user can enter any release or recording MBID (MusicBrainz identifier), and match it to the canonical entry.
The tables included in the dataset contain all the string metadata necessary to make effective use of the dataset. Artist names, release names and recording names are all present, indexed against the MBID’s. This lowers the barrier for entry to music-based development considerably — anyone can now import the dataset into their favourite datastore, and start looking up tracks.
The MetaBrainz Foundation offers a number of different datasets, often under the Creative Commons Zero (CC0) licence. These datasets can be used to build applications, databases, or train machine learning algorithms/AI. MetaBrainz Foundation datasets power countless projects, and stand behind the scenes of many of today’s largest tech companies, such as Microsoft, Google, and Amazon. The MetaBrainz Foundation datasets are all available on the MetaBrainz datasets page. The MetaBrainz Foundation uses the new MusicBrainz canonical metadata dataset themselves, primarily in the tagging application MusicBrainz Picard, and the social music site ListenBrainz.
I hate to say this, but I suspect that an LLM that has been trained on how to post on HN truncated the link because links on HN are (visually) truncated.
You can find all the MetaBrainz datasets here: https://metabrainz.org/dataset…
The MusicBrainz database aims to collect all the metadata for all music that has ever been published. For popular albums and songs, which have been released many times, it can be hard to answer the question “which one is the main (canonical) entry?” Using the new dataset, a user can enter any release or recording MBID (MusicBrainz identifier), and match it to the canonical entry.
The tables included in the dataset contain all the string metadata necessary to make effective use of the dataset. Artist names, release names and recording names are all present, indexed against the MBID’s. This lowers the barrier for entry to music-based development considerably — anyone can now import the dataset into their favourite datastore, and start looking up tracks.
The MetaBrainz Foundation offers a number of different datasets, often under the Creative Commons Zero (CC0) licence. These datasets can be used to build applications, databases, or train machine learning algorithms/AI. MetaBrainz Foundation datasets power countless projects, and stand behind the scenes of many of today’s largest tech companies, such as Microsoft, Google, and Amazon. The MetaBrainz Foundation datasets are all available on the MetaBrainz datasets page. The MetaBrainz Foundation uses the new MusicBrainz canonical metadata dataset themselves, primarily in the tagging application MusicBrainz Picard, and the social music site ListenBrainz.