.. _EDH:


******************************
Epigraphic Database Heidelberg
******************************

.. ---
.. layout: post
.. title:  "Epigraphic Database Heidelberg using R"
.. date:   26-11-2019 12:00:00
.. author: jaro
.. categories: Short-reports
.. ---

.. <!--- # Short report: Epigraphic Database Heidelberg using R
.. <span class="smallcaps">Antonio Rivero Ostoic  
.. 18-10-2019</span>
.. Short report:  [R]{.sans-serif}  --->

This post is about accessing the "Epigraphic Database Heidelberg" (EDH),
which is one of the longest running database projects in digital Latin
epigraphy. The [EDH] database started as early as year 1986, and in 1997 the Epigraphic
Database Heidelberg website was launched at
`https:/edh-www.adw.uni-heidelberg.de <https:/edh-www.adw.uni-heidelberg.de>`__ where inscriptions, images,
bibliographic and geographic records can be searched and browsed online.


Open Data Repository
====================

Despite the possibility of accessing the [EDH] database through a Web browser, it is
many times convenient to get the Open Data Repository by the
[EDH] through its
public Application Programming Interface (API).

For inscriptions, the generic search pattern Uniform Resource Identifier
(URI) is:

..  code-block:: 
    
    https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?par_1=value&par_2=value&par_n=value

with parameters `par` :math:`1,2,...n`.


.. with parameters `par` *1,2,...n*.




|

Response
========

The response from a query is in a Java Script Object Notation or JSON format such as:

.. code-block:: 

    {
       "total" : 61,
       "limit" : "20",
       "items" : [ ... ]
    }

* (see :ref:`"JSON structure" in Digital Humanities <json-str>`)

In this case, ``"items"`` has an array as a value where the returned records are located. The ``"total"`` 
and ``"limit"`` values correspond to the *total* number of records of the query, and the *limit* number is 
the amount of records to appear in the browser after the query. 


|



Accessing the EDH database using R
==================================

Accessing the [EDH] database [API] using ``R`` is possible with a convenient function that produces
the generic search pattern [URI]. Hence, the function ``get.edh()`` from the ``sdam`` package allows 
having access to the data with the available parameters that are recorded as arguments. Then the 
returned [JSON] file is converted into a list data object with function ``fromJSON()`` from the 
``rjson`` package. 

Currently, function ``get.edh()`` allows getting data with the ``search`` parameter 
either from ``"inscriptions"`` (the default option) or else from ``"geography"``. 
The other two search options from the [EDH] database [API], which are ``"photos"`` 
and ``"bibliography"``, may be implemented in the future in this function. 


* (see :ref:`R package "sdam" <sdam-pkg>`)


|


Function usage
--------------

.. function:: get.edh

.. code-block:: R

    # arguments supported
    R> get.edh(search = c("inscriptions", "geography")
              , url = "https://edh-www.adw.uni-heidelberg.de/data/api"
              , hd_nr, province, country, findspot_modern
              , findspot_ancient, year_not_before, year_not_after
              , tm_nr, transcription, type, bbox, findspot, pleiades_id
              , geonames_id, offset, limit, maxlimit=4000, addID, printQ)


|

    .. note:: 
    
        "``R>``" at the beginning of the line means that the following code is 
        written in ``R``. Comments are preceded by "``#``". 

|




Search parameters
-----------------

The following parameter description is from the `[EDH] database [API] 
<https://edh-www.adw.uni-heidelberg.de/data/api>`_



Inscriptions and Geography
^^^^^^^^^^^^^^^^^^^^^^^^^^

- `province:`

   get list of valid values at `province terms 
   <https://edh-www.adw.uni-heidelberg.de/data/api/terms/province>`_,
   in the [EDH] database [API], case insensitive

- `country:`

   get list of valid values at `country terms 
   <https://edh-www.adw.uni-heidelberg.de/data/api/terms/country>`_ 
   in the [EDH] database [API], case insensitive

- `findspot_modern:`

   add leading and/or trailing truncation by asterisk \*, e.g.
   ``findspot\_modern=köln\*``, case insensitive

- `findspot_ancient:`

   add leading and/or trailing truncation by asterisk \*, e.g.
   ``findspot\_ancient=aquae\*``, case insensitive

- `offset:`

   clause to specify which row to start from retrieving data, integer

- `limit:`

   clause to limit the number of results, integer (by default includes all records)

- `bbox:`

   bounding box with the format ``bbox=minLong, minLat, maxLong, maxLat``.
   

   
   The query example:

..  code-block:: 
    
    https://edh-www.adw.uni-heidelberg.de/data/api/inscriptions/search?bbox=11,47,12,48


that in [R] is a vector character.



    .. hint::
    
        Just make sure to quote the arguments in ``get.edh()`` for the different
        parameters that are not integers. This means for example that the query
        for the last parameter with the two search options is written as
        
        .. code-block:: R
        
            R> get.edh(search="inscriptions", bbox="11,47,12,48")
            R> get.edh(search="geography", bbox="11,47,12,48")


|


Inscriptions only
^^^^^^^^^^^^^^^^^

- `hd_nr:`

   HD-No of inscription

- `year_not_before:`

   integer, BC years are negative integers

- `year_not_after:`

   integer, BC years are negative integers

- `tm_nr:`

   Trismegistos database number (?)

- `transcription:`

   automatic leading and trailing truncation, brackets are ignored

- `type:`

   of inscription, get list of values at `terms type 
   <https://edh-www.adw.uni-heidelberg.de/data/api/terms/type>`_ 
   in the [EDH] database [API], case insensitive

|


Geography only
^^^^^^^^^^^^^^

- `findspot:`

   level of village, street etc.; add leading and/or trailing
    truncation by asterisk ``\*``, e.g. ``findspot\_modern=köln\*``, case
    insensitive

- `pleiades_id:`

   Pleiades identifier of a place; integer value

- `geonames_id:`

   Geonames identifier of a place; integer value

|


Extra parameters
^^^^^^^^^^^^^^^^

- `maxlimit:`

   Maximum limit of the query; integer, default 4000

- `addID:`

   Add identification to the output?

- `printQ:`

   Print also the query?

..  - `...`
..  
..     additional parameters if needed






|


The two functions we have seen so far, ``get.edh()`` and ``get.edhw()``, 
are available in the :ref:`R package "sdam" <sdam-pkg>`.

|

    .. seealso:: 
    
        * :ref:`"sdam" package installation <sdam-inst>`. 


..    * :ref:`The "sdam" package <sdam-inst>`



|




Examples
========

The examples are made with the ``sdam`` ``R`` package. 


Since  the ``get.edh()`` function needs to transform JSON output using ``rjson::fromJSON()``,
you need to have this package installed as well. 



|



Then, to run the examples you need to load the required libraries. 

.. code-block:: R

    R> library("sdam")
    R> require("rjson")  # https://cran.r-project.org/package=rjson 


|

The query

.. code-block:: R

    R> get.edh(findspot_modern="madrid")

returns this truncated output:

.. code-block:: R

    #$ID
    #[1] "041220"
    #
    #$commentary
    #[1] " Verschollen. Mögliche Datierung: 99-100."
    #
    #$country
    #[1] "Spain"
    #
    #$diplomatic_text
    #[1] "[ ] / [ ] / [ ] / GER PO[ ]TIF / [ ] / [ ] / [ ] / ["
    #
    #...
    #
    #$findspot_modern
    #[1] "Madrid"
    #
    #$id
    #[1] "HD041220"
    #
    #$language
    #[1] "Latin"
    #
    #...
    #

With ``"inscriptions"``, which is the default option of ``get.edh()`` and of the wrapper 
function ``get.edhw()``, the ``id`` "component" of the output list has not a numeric 
format. However, many times is convenient to have a numerical identifier in each record, 
and function ``get.edh()`` adds an ``ID`` with a numerical format at the beginning of the list.


Having a numerical identifier is useful for plotting the results, for example, and an ``ID`` is added 
to the output by default. You can prevent such addition by disabling 
argument ``addID`` with ``FALSE``.

.. code-block:: R

    R> get.edh(findspot_modern="madrid", addID=FALSE)


Further extensions to the [EDH] database [API] may be added in the future, and this will be 
handled with similar arguments in the ``get.edh()`` function ...


|


Accessing Epigraphic Database Heidelberg: Inscriptions
======================================================

.. function:: get.edhw

.. code-block:: R

    # to perform several queries
    R> get.edhw(hd_nr, ...)



To study temporary uncertainty, for example, we need to access to an epigraphic database like the Heidelberg. 
The wrapper function ``get.edhw()`` allows multiple queries by using the Heidelberg number ``hd_nr``.



``get.edhw()`` is a wrapper function to perform several queries from the Epigraphic Database Heidelberg API using 
identification numbers. 


    .. note:: 
    
        Currently, function ``get.edhw()`` works only for inscriptions. 


|



.. index:: EDH-dataset 

.. code-block:: R

    # get data API from EDH with a wrapper function
    R> EDH <- get.edhw(hd_nr=1:83821)  # (03-11-2020)

    R> length(EDH)
    #[1] 83821
    
    # or load it from the package
    R> data("EDH")


This wrapper function basically perform the following loop that 
will produce a list object with the existing entries for each inscription, 
and where entries have different length. 

.. code-block:: R

     # grab the data from EDH API and record it in 'EDH'
     R> EDH <- list()
     # 82464 INSCRIPTIONS (20-11-2019)
     R> for(i in seq_len(82464)) { 
     +    EDH[[length(EDH)+1L]] <- try(get.edh(hd_nr=i))
     +    }

Beware that retrieving such a large number of records will take a very long time, 
and this can be done by parts and then collate the lists into the ``EDH`` object. 


    .. note::
       Character ``+`` in the code shows the scope of the loop. 
       


|


Output
------

The output depends on each particular case. 

.. code-block:: R

    R> is(EDH)
    #[1] "list"   "vector"

|

The first record has 28 `attribute` names

.. code-block:: R

    # check variable names of first entry
    R> attr(EDH[[1]], "names")
    # [1] "ID"                     "commentary"             "country"               
    # [4] "depth"                  "diplomatic_text"        "edh_geography_uri"     
    # [7] "findspot_ancient"       "findspot_modern"        "height"                
    #[10] "id"                     "language"               "last_update"           
    #[13] "letter_size"            "literature"             "material"              
    #[16] "modern_region"          "not_after"              "not_before"            
    #[19] "people"                 "province_label"         "responsible_individual"
    #[22] "transcription"          "trismegistos_uri"       "type_of_inscription"   
    #[25] "type_of_monument"       "uri"                    "width"                 
    #[28] "work_status"


|

While record 21 has 34 items.


.. code-block:: R

    R> attr(EDH[[21]], "names")
    # [1] "ID"                            "commentary"                   
    # [3] "country"                       "depth"                        
    # [5] "diplomatic_text"               "edh_geography_uri"            
    # [7] "findspot"                      "findspot_ancient"             
    # [9] "findspot_modern"               "geography"                    
    #[11] "height"                        "id"                           
    #[13] "language"                      "last_update"                  
    #[15] "letter_size"                   "literature"                   
    #[17] "material"                      "military"                     
    #[19] "modern_region"                 "not_after"                    
    #[21] "not_before"                    "people"                       
    #[23] "present_location"              "province_label"               
    #[25] "responsible_individual"        "social_economic_legal_history"
    #[27] "transcription"                 "trismegistos_uri"             
    #[29] "type_of_inscription"           "type_of_monument"             
    #[31] "uri"                           "width"                        
    #[33] "work_status"                   "year_of_find"


|


Attribute ``people`` is another list with other `attribute` names


.. code-block:: R

    R> length(EDH[[1]]$people)
    #[1] 3

    R> attr(EDH[[1]]$people[[1]], "names")
    #[1] "name"      "gender"    "nomen"     "person_id" "cognomen"

    ...

    R> attr(EDH[[1]]$people[[3]], "names")
    [1] "cognomen"  "praenomen" "person_id" "gender"    "name"      "nomen"
    

|

* (see :ref:`attributes in EDH dataset <edh-attr>`)




|


.. meta::
   :description: Accessing the Epigraphic Database Heidelberg
   :keywords: epigraphic, documentation, dataset, HTTP-request