.. _EDH: ****************************** Epigraphic Database Heidelberg ****************************** .. --- .. layout: post .. title: "Epigraphic Database Heidelberg using R" .. date: 26-11-2019 12:00:00 .. author: jaro .. categories: Short-reports .. --- .. This post is about accessing the "Epigraphic Database Heidelberg" (EDH), which is one of the longest running database projects in digital Latin epigraphy. The [EDH] database started as early as year 1986, and in 1997 the Epigraphic Database Heidelberg website was launched at `https:/edh-www.adw.uni-heidelberg.de `__ where inscriptions, images, bibliographic and geographic records can be searched and browsed online. Open Data Repository ==================== Despite the possibility of accessing the [EDH] database through a Web browser, it is many times convenient to get the Open Data Repository by the [EDH] through its public Application Programming Interface (API). For inscriptions, the generic search pattern Uniform Resource Identifier (URI) is: .. code-block:: https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?par_1=value&par_2=value&par_n=value with parameters `par` :math:`1,2,...n`. .. with parameters `par` *1,2,...n*. | Response ======== The response from a query is in a Java Script Object Notation or JSON format such as: .. code-block:: { "total" : 61, "limit" : "20", "items" : [ ... ] } * (see :ref:`"JSON structure" in Digital Humanities `) In this case, ``"items"`` has an array as a value where the returned records are located. The ``"total"`` and ``"limit"`` values correspond to the *total* number of records of the query, and the *limit* number is the amount of records to appear in the browser after the query. | Accessing the EDH database using R ================================== Accessing the [EDH] database [API] using ``R`` is possible with a convenient function that produces the generic search pattern [URI]. Hence, the function ``get.edh()`` from the ``sdam`` package allows having access to the data with the available parameters that are recorded as arguments. Then the returned [JSON] file is converted into a list data object with function ``fromJSON()`` from the ``rjson`` package. Currently, function ``get.edh()`` allows getting data with the ``search`` parameter either from ``"inscriptions"`` (the default option) or else from ``"geography"``. The other two search options from the [EDH] database [API], which are ``"photos"`` and ``"bibliography"``, may be implemented in the future in this function. * (see :ref:`R package "sdam" `) | Function usage -------------- .. function:: get.edh .. code-block:: R # arguments supported R> get.edh(search = c("inscriptions", "geography") , url = "https://edh-www.adw.uni-heidelberg.de/data/api" , hd_nr, province, country, findspot_modern , findspot_ancient, year_not_before, year_not_after , tm_nr, transcription, type, bbox, findspot, pleiades_id , geonames_id, offset, limit, maxlimit=4000, addID, printQ) | .. note:: "``R>``" at the beginning of the line means that the following code is written in ``R``. Comments are preceded by "``#``". | Search parameters ----------------- The following parameter description is from the `[EDH] database [API] `_ Inscriptions and Geography ^^^^^^^^^^^^^^^^^^^^^^^^^^ - `province:` get list of valid values at `province terms `_, in the [EDH] database [API], case insensitive - `country:` get list of valid values at `country terms `_ in the [EDH] database [API], case insensitive - `findspot_modern:` add leading and/or trailing truncation by asterisk \*, e.g. ``findspot\_modern=köln\*``, case insensitive - `findspot_ancient:` add leading and/or trailing truncation by asterisk \*, e.g. ``findspot\_ancient=aquae\*``, case insensitive - `offset:` clause to specify which row to start from retrieving data, integer - `limit:` clause to limit the number of results, integer (by default includes all records) - `bbox:` bounding box with the format ``bbox=minLong, minLat, maxLong, maxLat``. The query example: .. code-block:: https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?bbox=11,47,12,48 that in [R] is a vector character. .. hint:: Just make sure to quote the arguments in ``get.edh()`` for the different parameters that are not integers. This means for example that the query for the last parameter with the two search options is written as .. code-block:: R R> get.edh(search="inscriptions", bbox="11,47,12,48") R> get.edh(search="geography", bbox="11,47,12,48") | Inscriptions only ^^^^^^^^^^^^^^^^^ - `hd_nr:` HD-No of inscription - `year_not_before:` integer, BC years are negative integers - `year_not_after:` integer, BC years are negative integers - `tm_nr:` Trismegistos database number (?) - `transcription:` automatic leading and trailing truncation, brackets are ignored - `type:` of inscription, get list of values at `terms type `_ in the [EDH] database [API], case insensitive | Geography only ^^^^^^^^^^^^^^ - `findspot:` level of village, street etc.; add leading and/or trailing truncation by asterisk ``\*``, e.g. ``findspot\_modern=köln\*``, case insensitive - `pleiades_id:` Pleiades identifier of a place; integer value - `geonames_id:` Geonames identifier of a place; integer value | Extra parameters ^^^^^^^^^^^^^^^^ - `maxlimit:` Maximum limit of the query; integer, default 4000 - `addID:` Add identification to the output? - `printQ:` Print also the query? .. - `...` .. .. additional parameters if needed | The two functions we have seen so far, ``get.edh()`` and ``get.edhw()``, are available in the :ref:`R package "sdam" `. | .. seealso:: * :ref:`"sdam" package installation `. .. * :ref:`The "sdam" package ` | Examples ======== The examples are made with the ``sdam`` ``R`` package. Since the ``get.edh()`` function needs to transform JSON output using ``rjson::fromJSON()``, you need to have this package installed as well. | Then, to run the examples you need to load the required libraries. .. code-block:: R R> library("sdam") R> require("rjson") # https://cran.r-project.org/package=rjson | The query .. code-block:: R R> get.edh(findspot_modern="madrid") returns this truncated output: .. code-block:: R #$ID #[1] "041220" # #$commentary #[1] " Verschollen. Mögliche Datierung: 99-100." # #$country #[1] "Spain" # #$diplomatic_text #[1] "[ ] / [ ] / [ ] / GER PO[ ]TIF / [ ] / [ ] / [ ] / [" # #... # #$findspot_modern #[1] "Madrid" # #$id #[1] "HD041220" # #$language #[1] "Latin" # #... # With ``"inscriptions"``, which is the default option of ``get.edh()`` and of the wrapper function ``get.edhw()``, the ``id`` "component" of the output list has not a numeric format. However, many times is convenient to have a numerical identifier in each record, and function ``get.edh()`` adds an ``ID`` with a numerical format at the beginning of the list. Having a numerical identifier is useful for plotting the results, for example, and an ``ID`` is added to the output by default. You can prevent such addition by disabling argument ``addID`` with ``FALSE``. .. code-block:: R R> get.edh(findspot_modern="madrid", addID=FALSE) Further extensions to the [EDH] database [API] may be added in the future, and this will be handled with similar arguments in the ``get.edh()`` function ... | Accessing Epigraphic Database Heidelberg: Inscriptions ====================================================== .. function:: get.edhw .. code-block:: R # to perform several queries R> get.edhw(hd_nr, ...) To study temporary uncertainty, for example, we need to access to an epigraphic database like the Heidelberg. The wrapper function ``get.edhw()`` allows multiple queries by using the Heidelberg number ``hd_nr``. ``get.edhw()`` is a wrapper function to perform several queries from the Epigraphic Database Heidelberg API using identification numbers. .. note:: Currently, function ``get.edhw()`` works only for inscriptions. | .. index:: EDH-dataset .. code-block:: R # get data API from EDH with a wrapper function R> EDH <- get.edhw(hd_nr=1:83821) # (03-11-2020) R> length(EDH) #[1] 83821 # or load it from the package R> data("EDH") This wrapper function basically perform the following loop that will produce a list object with the existing entries for each inscription, and where entries have different length. .. code-block:: R # grab the data from EDH API and record it in 'EDH' R> EDH <- list() # 82464 INSCRIPTIONS (20-11-2019) R> for(i in seq_len(82464)) { + EDH[[length(EDH)+1L]] <- try(get.edh(hd_nr=i)) + } Beware that retrieving such a large number of records will take a very long time, and this can be done by parts and then collate the lists into the ``EDH`` object. .. note:: Character ``+`` in the code shows the scope of the loop. | Output ------ The output depends on each particular case. .. code-block:: R R> is(EDH) #[1] "list" "vector" | The first record has 28 `attribute` names .. code-block:: R # check variable names of first entry R> attr(EDH[[1]], "names") # [1] "ID" "commentary" "country" # [4] "depth" "diplomatic_text" "edh_geography_uri" # [7] "findspot_ancient" "findspot_modern" "height" #[10] "id" "language" "last_update" #[13] "letter_size" "literature" "material" #[16] "modern_region" "not_after" "not_before" #[19] "people" "province_label" "responsible_individual" #[22] "transcription" "trismegistos_uri" "type_of_inscription" #[25] "type_of_monument" "uri" "width" #[28] "work_status" | While record 21 has 34 items. .. code-block:: R R> attr(EDH[[21]], "names") # [1] "ID" "commentary" # [3] "country" "depth" # [5] "diplomatic_text" "edh_geography_uri" # [7] "findspot" "findspot_ancient" # [9] "findspot_modern" "geography" #[11] "height" "id" #[13] "language" "last_update" #[15] "letter_size" "literature" #[17] "material" "military" #[19] "modern_region" "not_after" #[21] "not_before" "people" #[23] "present_location" "province_label" #[25] "responsible_individual" "social_economic_legal_history" #[27] "transcription" "trismegistos_uri" #[29] "type_of_inscription" "type_of_monument" #[31] "uri" "width" #[33] "work_status" "year_of_find" | Attribute ``people`` is another list with other `attribute` names .. code-block:: R R> length(EDH[[1]]$people) #[1] 3 R> attr(EDH[[1]]$people[[1]], "names") #[1] "name" "gender" "nomen" "person_id" "cognomen" ... R> attr(EDH[[1]]$people[[3]], "names") [1] "cognomen" "praenomen" "person_id" "gender" "name" "nomen" | * (see :ref:`attributes in EDH dataset `) | .. meta:: :description: Accessing the Epigraphic Database Heidelberg :keywords: epigraphic, documentation, dataset, HTTP-request