Hindsight is easier than foresight, and this is also true when handling data quality. An initial problem is that often the funding and resources are not available to concentrate on data quality in the first place. However, with the crosslinking of various resources, this can easily lead to a multiplication of bad data. Yet cleaning it afterwards is much more complicated. Last year at CAA we proposed a rule system that could help to identify potential errors based on the data itself. In this paper we will report how this has evolved, and what experiences we have had in using LOD such as Nomisma.org within the solution we employ in Antike Fundmünzen Europa (AFE), our database for information on finds of ancient coins. However, there are also errors that are simply invisible for a rule system because the data is logically correct. In some cases, it is possible to use additional sources of information, for example the images or descriptions in natural language which are attached to the data and until now are mainly used for human interaction. Using Natural Language Processing tools and algorithms for images provided by free software like Open BC, we are exploring what other solutions exist. Such automated analysis of the data also provides the further possibility of searching for coins based on an iconographic thesaurus. In addition to better data quality, this could provide a new and neutral way of accessing and researching data that potentially reveals new insights.
0 Comments