Epigraphic Database Heidelberg¶
This post is about accessing the “Epigraphic Database Heidelberg” (EDH), which is one of the longest running database projects in digital Latin epigraphy. The [EDH] database started as early as year 1986, and in 1997 the Epigraphic Database Heidelberg website was launched at https:/edh-www.adw.uni-heidelberg.de where inscriptions, images, bibliographic and geographic records can be searched and browsed online.
Open Data Repository¶
Despite the possibility of accessing the [EDH] database through a Web browser, it is many times convenient to get the Open Data Repository by the [EDH] through its public Application Programming Interface (API).
For inscriptions, the generic search pattern Uniform Resource Identifier (URI) is:
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?par_1=value&par_2=value&par_n=value
with parameters par \(1,2,...n\).
Response¶
The response from a query is in a Java Script Object Notation or JSON format such as:
{
"total" : 61,
"limit" : "20",
"items" : [ ... ]
}
In this case, "items"
has an array as a value where the returned records are located. The "total"
and "limit"
values correspond to the total number of records of the query, and the limit number is
the amount of records to appear in the browser after the query.
Accessing the EDH database using R¶
Accessing the [EDH] database [API] using R
is possible with a convenient function that produces
the generic search pattern [URI]. Hence, the function get.edh()
from the sdam
package allows
having access to the data with the available parameters that are recorded as arguments. Then the
returned [JSON] file is converted into a list data object with function fromJSON()
from the
rjson
package.
Currently, function get.edh()
allows getting data with the search
parameter
either from "inscriptions"
(the default option) or else from "geography"
.
The other two search options from the [EDH] database [API], which are "photos"
and "bibliography"
, may be implemented in the future in this function.
(see R package “sdam”)
Function usage¶
-
get.
edh
()¶
# arguments supported
R> get.edh(search = c("inscriptions", "geography")
, url = "https://edh-www.adw.uni-heidelberg.de/data/api"
, hd_nr, province, country, findspot_modern
, findspot_ancient, year_not_before, year_not_after
, tm_nr, transcription, type, bbox, findspot, pleiades_id
, geonames_id, offset, limit, maxlimit=4000, addID, printQ)
Note
“
R>
” at the beginning of the line means that the following code is written inR
. Comments are preceded by “#
”.
Search parameters¶
The following parameter description is from the [EDH] database [API]
Inscriptions and Geography¶
province:
get list of valid values at province terms, in the [EDH] database [API], case insensitive
country:
get list of valid values at country terms in the [EDH] database [API], case insensitive
findspot_modern:
add leading and/or trailing truncation by asterisk *, e.g.
findspot\_modern=köln\*
, case insensitivefindspot_ancient:
add leading and/or trailing truncation by asterisk *, e.g.
findspot\_ancient=aquae\*
, case insensitiveoffset:
clause to specify which row to start from retrieving data, integer
limit:
clause to limit the number of results, integer (by default includes all records)
bbox:
bounding box with the format
bbox=minLong, minLat, maxLong, maxLat
.The query example:
https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?bbox=11,47,12,48
that in [R] is a vector character.
Hint
Just make sure to quote the arguments in
get.edh()
for the different parameters that are not integers. This means for example that the query for the last parameter with the two search options is written asR> get.edh(search="inscriptions", bbox="11,47,12,48") R> get.edh(search="geography", bbox="11,47,12,48")
Inscriptions only¶
hd_nr:
HD-No of inscription
year_not_before:
integer, BC years are negative integers
year_not_after:
integer, BC years are negative integers
tm_nr:
Trismegistos database number (?)
transcription:
automatic leading and trailing truncation, brackets are ignored
type:
of inscription, get list of values at terms type in the [EDH] database [API], case insensitive
Geography only¶
findspot:
- level of village, street etc.; add leading and/or trailing
truncation by asterisk
\*
, e.g.findspot\_modern=köln\*
, case insensitive
pleiades_id:
Pleiades identifier of a place; integer value
geonames_id:
Geonames identifier of a place; integer value
Extra parameters¶
maxlimit:
Maximum limit of the query; integer, default 4000
addID:
Add identification to the output?
printQ:
Print also the query?
The two functions we have seen so far, get.edh()
and get.edhw()
,
are available in the R package “sdam”.
See also
Examples¶
The examples are made with the sdam
R
package.
Since the get.edh()
function needs to transform JSON output using rjson::fromJSON()
,
you need to have this package installed as well.
Then, to run the examples you need to load the required libraries.
R> library("sdam")
R> require("rjson") # https://cran.r-project.org/package=rjson
The query
R> get.edh(findspot_modern="madrid")
returns this truncated output:
#$ID
#[1] "041220"
#
#$commentary
#[1] " Verschollen. Mögliche Datierung: 99-100."
#
#$country
#[1] "Spain"
#
#$diplomatic_text
#[1] "[ ] / [ ] / [ ] / GER PO[ ]TIF / [ ] / [ ] / [ ] / ["
#
#...
#
#$findspot_modern
#[1] "Madrid"
#
#$id
#[1] "HD041220"
#
#$language
#[1] "Latin"
#
#...
#
With "inscriptions"
, which is the default option of get.edh()
and of the wrapper
function get.edhw()
, the id
“component” of the output list has not a numeric
format. However, many times is convenient to have a numerical identifier in each record,
and function get.edh()
adds an ID
with a numerical format at the beginning of the list.
Having a numerical identifier is useful for plotting the results, for example, and an ID
is added
to the output by default. You can prevent such addition by disabling
argument addID
with FALSE
.
R> get.edh(findspot_modern="madrid", addID=FALSE)
Further extensions to the [EDH] database [API] may be added in the future, and this will be
handled with similar arguments in the get.edh()
function …
Accessing Epigraphic Database Heidelberg: Inscriptions¶
-
get.
edhw
()¶
# to perform several queries
R> get.edhw(hd_nr, ...)
To study temporary uncertainty, for example, we need to access to an epigraphic database like the Heidelberg.
The wrapper function get.edhw()
allows multiple queries by using the Heidelberg number hd_nr
.
get.edhw()
is a wrapper function to perform several queries from the Epigraphic Database Heidelberg API using
identification numbers.
Note
Currently, function
get.edhw()
works only for inscriptions.
# get data API from EDH with a wrapper function
R> EDH <- get.edhw(hd_nr=1:83821) # (03-11-2020)
R> length(EDH)
#[1] 83821
# or load it from the package
R> data("EDH")
This wrapper function basically perform the following loop that will produce a list object with the existing entries for each inscription, and where entries have different length.
# grab the data from EDH API and record it in 'EDH'
R> EDH <- list()
# 82464 INSCRIPTIONS (20-11-2019)
R> for(i in seq_len(82464)) {
+ EDH[[length(EDH)+1L]] <- try(get.edh(hd_nr=i))
+ }
Beware that retrieving such a large number of records will take a very long time,
and this can be done by parts and then collate the lists into the EDH
object.
Note
Character
+
in the code shows the scope of the loop.
Output¶
The output depends on each particular case.
R> is(EDH)
#[1] "list" "vector"
The first record has 28 attribute names
# check variable names of first entry
R> attr(EDH[[1]], "names")
# [1] "ID" "commentary" "country"
# [4] "depth" "diplomatic_text" "edh_geography_uri"
# [7] "findspot_ancient" "findspot_modern" "height"
#[10] "id" "language" "last_update"
#[13] "letter_size" "literature" "material"
#[16] "modern_region" "not_after" "not_before"
#[19] "people" "province_label" "responsible_individual"
#[22] "transcription" "trismegistos_uri" "type_of_inscription"
#[25] "type_of_monument" "uri" "width"
#[28] "work_status"
While record 21 has 34 items.
R> attr(EDH[[21]], "names")
# [1] "ID" "commentary"
# [3] "country" "depth"
# [5] "diplomatic_text" "edh_geography_uri"
# [7] "findspot" "findspot_ancient"
# [9] "findspot_modern" "geography"
#[11] "height" "id"
#[13] "language" "last_update"
#[15] "letter_size" "literature"
#[17] "material" "military"
#[19] "modern_region" "not_after"
#[21] "not_before" "people"
#[23] "present_location" "province_label"
#[25] "responsible_individual" "social_economic_legal_history"
#[27] "transcription" "trismegistos_uri"
#[29] "type_of_inscription" "type_of_monument"
#[31] "uri" "width"
#[33] "work_status" "year_of_find"
Attribute people
is another list with other attribute names
R> length(EDH[[1]]$people)
#[1] 3
R> attr(EDH[[1]]$people[[1]], "names")
#[1] "name" "gender" "nomen" "person_id" "cognomen"
...
R> attr(EDH[[1]]$people[[3]], "names")
[1] "cognomen" "praenomen" "person_id" "gender" "name" "nomen"