Extracting EDH Variables


EDH variables

Another wrapper function, this time for the extraction of variables from the EDH dataset is found in edhw(). Use data("rp") for Roman provinces in province.


edhw()

Function usage

# accepted parameter arguments
R> edhw(vars, x = NULL, as = c("list", "df"), type = c("long", "wide", "narrow"),
        split, select, addID, limit, id, na.rm, clean, province, gender, ...)

Parameters

Formal arguments of edhw() are:

  • vars:

    Chosen variables from the EDH dataset (vector)

  • x:

    An optional list object name with fragments of the EDH dataset

  • as:

    Format to return the output. Currently either as a "list" or a data frame "df" object.

  • type:

    Format of the data frame output. Currently either a "long" or a "wide" table (option "narrow" not yet implemented).

  • split:

    Divide the data into groups by id? (logical and optional)

  • select:

    "people" variables to select (vector and optional, data frame type "long" only)

  • addID:

    Add identification to the output? (optional and logical)

  • limit:

    Limit the returned output. Ignored if id is specified (optional, integer or vector)

  • id:

    Select only the hd_nr id(s) (optional, integer or character)

  • na.rm:

    Remove entries with <NA>? (logical and optional)

  • clean:

    Replace entries with <NA>? (logical and optional)

  • province:

    Roman province (character, optional) as in "rp" dataset.

  • gender:

    People gender in EDH (character, optional)


Todo

Implement the "narrow" type option to edhw().


Attributes in EDH dataset

The aim of the edhw() function is to extract attributes or variables of inscriptions. These inscriptions are output values typically produced by the get.edh() or get.edhw() functions.

The records in the EDH dataset have at least one the following items:

"commentary" "fotos" "country" "depth"

"diplomatic_text" "edh_geography_uri" "findspot" "findspot_ancient" "findspot_modern"

"geography" "height" "id" "language" "last_update" "letter_size" "literature"

"material" "military" "modern_region" "not_after" "not_before" "present_location"

"religion" "province_label" "responsible_individual" "social_economic_legal_history"

"transcription" "trismegistos_uri" "type_of_inscription" "type_of_monument" "uri"

"width" "work_status" "year_of_find"


Another output variable is "people" that is a list of persons named in the inscriptions with at least the following items

"person_id" "nomen" "cognomen" "praenomen" "name" "gender" "status" "tribus"

"origo" "occupation" "age: years" "age: months" "age: days"


Relative dating in EDH

We are going to apply the edhw() function to check the relative dating of Roman inscriptions, and for a simple relative dating analysis, we choose chronological data variables in vars:

# make a list for relative variables in 'EDH' (default)
R> edhw(vars=c("not_after", "not_before"))

Since argument x is not specified in the function, the "EDH" dataset in the sdam package is taken if available with a Warning message.


In this case, the boundaries of the timespan of existence are variables "not_after" and "not_before", respectively.



Hint

The above use of function edhw() is wrapping the base lapply function as

# recursively apply a function over the list for  variables
R> lapply(EDH, `[`, c("not_after", "not_before") )

where a pair of backquotes (aka “backticks”) is a way to refer in R to names or combinations of symbols that are otherwise reserved or illegal, or non-syntactic names. Hence, e.g. apply(foo, `[`, c(...) ) is the same as apply(foo, function (x) x[c(...)]).


The structure of such chronological data items is a list object with an id for all entries that is the EDH hd_nr.

R> str(edhw(vars=c("not_after", "not_before")))
#List of 83821
# $ :List of 3
#  ..$ id        : chr "HD000001"
#  ..$ not_after : chr "0130"
#  ..$ not_before: chr "0071"
# $ :List of 3
#  ..$ id        : chr "HD000002"
#  ..$ not_after : chr "0200"
#  ..$ not_before: chr "0051"
# ...

Complete cases

By default, function edhw() do not remove missing data when present in all variables, but is possible to remove missing information by activating the na.rm argument and work with complete cases.

# remove missing data
R> str(edhw(vars=c("not_after", "not_before"), na.rm=TRUE))
#List of 60224
# $ :List of 3
# ...

However, tackling the temporal uncertainty problem is an important type of analysis.


See also

Missing data within temporal uncertainty.