Epigraphic Database Heidelberg

This post is about accessing the “Epigraphic Database Heidelberg” (EDH), which is one of the longest running database projects in digital Latin epigraphy. The [EDH] database started as early as year 1986, and in 1997 the Epigraphic Database Heidelberg website was launched at https:/edh-www.adw.uni-heidelberg.de where inscriptions, images, bibliographic and geographic records can be searched and browsed online.

Open Data Repository

Despite the possibility of accessing the [EDH] database through a Web browser, it is many times convenient to get the Open Data Repository by the [EDH] through its public Application Programming Interface (API).

For inscriptions, the generic search pattern Uniform Resource Identifier (URI) is:

https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?par_1=value&par_2=value&par_n=value

with parameters par \(1,2,...n\).


Response

The response from a query is in a Java Script Object Notation or JSON format such as:

{
   "total" : 61,
   "limit" : "20",
   "items" : [ ... ]
}

In this case, "items" has an array as a value where the returned records are located. The "total" and "limit" values correspond to the total number of records of the query, and the limit number is the amount of records to appear in the browser after the query.


Accessing the EDH database using R

Accessing the [EDH] database [API] using R is possible with a convenient function that produces the generic search pattern [URI]. Hence, the function get.edh() from the sdam package allows having access to the data with the available parameters that are recorded as arguments. Then the returned [JSON] file is converted into a list data object with function fromJSON() from the rjson package.

Currently, function get.edh() allows getting data with the search parameter either from "inscriptions" (the default option) or else from "geography". The other two search options from the [EDH] database [API], which are "photos" and "bibliography", may be implemented in the future in this function.


Function usage

get.edh()
# arguments supported
R> get.edh(search = c("inscriptions", "geography")
          , url = "https://edh-www.adw.uni-heidelberg.de/data/api"
          , hd_nr, province, country, findspot_modern
          , findspot_ancient, year_not_before, year_not_after
          , tm_nr, transcription, type, bbox, findspot, pleiades_id
          , geonames_id, offset, limit, maxlimit=4000, addID, printQ)

Note

R>” at the beginning of the line means that the following code is written in R. Comments are preceded by “#”.


Search parameters

The following parameter description is from the [EDH] database [API]

Inscriptions and Geography

  • province:

    get list of valid values at province terms, in the [EDH] database [API], case insensitive

  • country:

    get list of valid values at country terms in the [EDH] database [API], case insensitive

  • findspot_modern:

    add leading and/or trailing truncation by asterisk *, e.g. findspot\_modern=köln\*, case insensitive

  • findspot_ancient:

    add leading and/or trailing truncation by asterisk *, e.g. findspot\_ancient=aquae\*, case insensitive

  • offset:

    clause to specify which row to start from retrieving data, integer

  • limit:

    clause to limit the number of results, integer (by default includes all records)

  • bbox:

    bounding box with the format bbox=minLong, minLat, maxLong, maxLat.

    The query example:

https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?bbox=11,47,12,48

that in [R] is a vector character.

Hint

Just make sure to quote the arguments in get.edh() for the different parameters that are not integers. This means for example that the query for the last parameter with the two search options is written as

R> get.edh(search="inscriptions", bbox="11,47,12,48")
R> get.edh(search="geography", bbox="11,47,12,48")

Inscriptions only

  • hd_nr:

    HD-No of inscription

  • year_not_before:

    integer, BC years are negative integers

  • year_not_after:

    integer, BC years are negative integers

  • tm_nr:

    Trismegistos database number (?)

  • transcription:

    automatic leading and trailing truncation, brackets are ignored

  • type:

    of inscription, get list of values at terms type in the [EDH] database [API], case insensitive


Geography only

  • findspot:

    level of village, street etc.; add leading and/or trailing

    truncation by asterisk \*, e.g. findspot\_modern=köln\*, case insensitive

  • pleiades_id:

    Pleiades identifier of a place; integer value

  • geonames_id:

    Geonames identifier of a place; integer value


Extra parameters

  • maxlimit:

    Maximum limit of the query; integer, default 4000

  • addID:

    Add identification to the output?

  • printQ:

    Print also the query?


The two functions we have seen so far, get.edh() and get.edhw(), are available in the R package “sdam”.



Examples

The examples are made with the sdam R package.

Since the get.edh() function needs to transform JSON output using rjson::fromJSON(), you need to have this package installed as well.


Then, to run the examples you need to load the required libraries.

R> library("sdam")
R> require("rjson")  # https://cran.r-project.org/package=rjson

The query

R> get.edh(findspot_modern="madrid")

returns this truncated output:

#$ID
#[1] "041220"
#
#$commentary
#[1] " Verschollen. Mögliche Datierung: 99-100."
#
#$country
#[1] "Spain"
#
#$diplomatic_text
#[1] "[ ] / [ ] / [ ] / GER PO[ ]TIF / [ ] / [ ] / [ ] / ["
#
#...
#
#$findspot_modern
#[1] "Madrid"
#
#$id
#[1] "HD041220"
#
#$language
#[1] "Latin"
#
#...
#

With "inscriptions", which is the default option of get.edh() and of the wrapper function get.edhw(), the id “component” of the output list has not a numeric format. However, many times is convenient to have a numerical identifier in each record, and function get.edh() adds an ID with a numerical format at the beginning of the list.

Having a numerical identifier is useful for plotting the results, for example, and an ID is added to the output by default. You can prevent such addition by disabling argument addID with FALSE.

R> get.edh(findspot_modern="madrid", addID=FALSE)

Further extensions to the [EDH] database [API] may be added in the future, and this will be handled with similar arguments in the get.edh() function …


Accessing Epigraphic Database Heidelberg: Inscriptions

get.edhw()
# to perform several queries
R> get.edhw(hd_nr, ...)

To study temporary uncertainty, for example, we need to access to an epigraphic database like the Heidelberg. The wrapper function get.edhw() allows multiple queries by using the Heidelberg number hd_nr.

get.edhw() is a wrapper function to perform several queries from the Epigraphic Database Heidelberg API using identification numbers.

Note

Currently, function get.edhw() works only for inscriptions.


# get data API from EDH with a wrapper function
R> EDH <- get.edhw(hd_nr=1:83821)  # (03-11-2020)

R> length(EDH)
#[1] 83821

# or load it from the package
R> data("EDH")

This wrapper function basically perform the following loop that will produce a list object with the existing entries for each inscription, and where entries have different length.

# grab the data from EDH API and record it in 'EDH'
R> EDH <- list()
# 82464 INSCRIPTIONS (20-11-2019)
R> for(i in seq_len(82464)) {
+    EDH[[length(EDH)+1L]] <- try(get.edh(hd_nr=i))
+    }

Beware that retrieving such a large number of records will take a very long time, and this can be done by parts and then collate the lists into the EDH object.

Note

Character + in the code shows the scope of the loop.


Output

The output depends on each particular case.

R> is(EDH)
#[1] "list"   "vector"

The first record has 28 attribute names

# check variable names of first entry
R> attr(EDH[[1]], "names")
# [1] "ID"                     "commentary"             "country"
# [4] "depth"                  "diplomatic_text"        "edh_geography_uri"
# [7] "findspot_ancient"       "findspot_modern"        "height"
#[10] "id"                     "language"               "last_update"
#[13] "letter_size"            "literature"             "material"
#[16] "modern_region"          "not_after"              "not_before"
#[19] "people"                 "province_label"         "responsible_individual"
#[22] "transcription"          "trismegistos_uri"       "type_of_inscription"
#[25] "type_of_monument"       "uri"                    "width"
#[28] "work_status"

While record 21 has 34 items.

R> attr(EDH[[21]], "names")
# [1] "ID"                            "commentary"
# [3] "country"                       "depth"
# [5] "diplomatic_text"               "edh_geography_uri"
# [7] "findspot"                      "findspot_ancient"
# [9] "findspot_modern"               "geography"
#[11] "height"                        "id"
#[13] "language"                      "last_update"
#[15] "letter_size"                   "literature"
#[17] "material"                      "military"
#[19] "modern_region"                 "not_after"
#[21] "not_before"                    "people"
#[23] "present_location"              "province_label"
#[25] "responsible_individual"        "social_economic_legal_history"
#[27] "transcription"                 "trismegistos_uri"
#[29] "type_of_inscription"           "type_of_monument"
#[31] "uri"                           "width"
#[33] "work_status"                   "year_of_find"

Attribute people is another list with other attribute names

R> length(EDH[[1]]$people)
#[1] 3

R> attr(EDH[[1]]$people[[1]], "names")
#[1] "name"      "gender"    "nomen"     "person_id" "cognomen"

...

R> attr(EDH[[1]]$people[[3]], "names")
[1] "cognomen"  "praenomen" "person_id" "gender"    "name"      "nomen"