The Knowledge Commons

Exploring Open Data: Seattle Mariners Players in Wikidata

The Current State of Seattle Mariners Players Open Data

I am not a big sports buff, but I just may become one now that I see the wealth of sports data that has been and continues to be curated by fan communities in open and collaborative ways. What these fans call stats is just data, and their care of these stats is a testament to the idea that public data can meet high standards of accuracy and completeness. The sport with arguably the most committed data nerds is baseball. I grew up in the Puget Sound region of Washington and going to the Kingdome (and then Safeco Field) to watch Ken Griffey Jr., Edgar Martinez, Ichiro Suzuki play baseball for the Seattle Mariners were core childhood memories. Although I didn’t find any open data sources specific to Seattle Mariners players, their participation in Major League Baseball (MLB) means any MLB players dataset should include them.

Continue reading →

Exploring Open Data: Supreme Court Rulings in Wikidata

The Current State of Open Data on U.S. Supreme Court Rulings

The recording of law has predominantly been a “document” practice. Legal process largely involves submitting statements in filings, oral arguments are transcribed, judges’ decisions are written up and summarized. Deconstructing the content of these documents into structured data, especially machine-readable structured data, has been and is a slow, meticulous, multi-faceted process. While several European nations have made huge strides to bring their legal data to current Open, Linked, and Semantic standards, much of U.S. legal digital data is still only accessible via document formats, behind paywalls, online dashboards, or confusing data dumps. The Administrative Office of the U.S. Courts manages access to official records through PACER (Public Access to Court Electronic Records), a controversial system criticized for being archaic and restrictive, so much so that activist efforts have been made to release the data to the public (see Carl Malamud and Aaron Swartz).

Continue reading →

Exploring Open Data: Notable Dogs in Wikidata

The Current State of Open Data on Notable Dogs

There are two categories of dog data that a dog lover and data nerd might be interested in:

  1. general knowledge about dogs, such as breed details, training and ownership guides, biological and evolutionary science, etc.
  2. information on specific famous or important dogs, or notable instances of the dog species, which is the focus of this week’s Open Data exploration

Unfortunately, dog lovers and data nerds will find the public data available in both categories to be sorely lacking. Resources that provide general dog information do exist, but that data is not always retrievable, especially via APIs, database endpoints, or dataset downloads. For dog breed knowledge, an authoritative source is the American Kennel Club (AKC), which provides an excellent online dashboard, but API data access appears to only be available internally and to club members. Enterprising individuals have scraped AKC breed data from the site into downloadable datasets, but it is unknown how often these are updated. Good structured, semantic breed data is available to the public via Pawsome Authority’s structured, semantic JSON-LD datasets. Some institutions, especially universities, have done research into dog breed visual identification, and made those datasets available, but they are image-focused and light on structured metadata. An amazing Open Data initiative on scientific dog information is implemented by the The Dog Aging Project, who will provide controlled API access to their biomedical data upon request.

Continue reading →

Exploring Open Data: Public Domain Works in Wikidata

The Current State of Open Data on Public Domain Works

A public domain work is essentially any creative material that is not protected by copyright law. The internet has made the content of these works wonderfully findable and accessible by the general public. Surprisingly, retrieving the metadata on these works is an entirely different story, perhaps precisely because there is no IP. Public domain art archives have made the biggest strides; the Metropolitan Museum of Art and the Art Institute of Chicago have been pioneers in creating rich knowledge structures for their archives and making that data open via robust APIs. Several art institutions such as Rijksmuseum in the Netherlands, Harvard Art Museums, and the Smithsonian are not far behind. As such, a user or agent wanting to query and build public art knowledge graphs can expect to do so without too much friction. Unfortunately, outside of art, this is much less the case. In the area of literature, Project Gutenberg and Open Library, excellent sources for reading public domain text, do open up their metadata, but queryability, schemas, and comprehensiveness are lacking. Similarly, with public domain music, music can be readily accessed through various internet archives and music scores through the International Music Score Library Project, but complete and accurate metadata is difficult to retrieve. Fortunately, MusicBrainz, an open encyclopedia for music data, is facilitating the ongoing curation and access for much of this information. Public domain film metadata does not appear to be widely available in clean, open, queryable formats, but as exponentially more films enter the public domain in the coming decades, the hope is this will change.

Continue reading →