Digital Humanities

The use of digital technologies to pursue research questions in the humanities.

Data for humanities

  1. Document markup languages

    • ConTeXt a TeX macro package that has a cleaner interface to control typography of the document while retaining LaTeX’s structure-oriented approach

      • with separation of content and presentation, it can format XML text, …

    • EAD (Encoded Archival Description)

  2. Citation

    • Arts and Humanities Citation Index (AHCI)

      • Machine-readable bibliographic record - MARC, RIS, BibTeX

  3. Geospatial/geographical data

    • GeoJSON is a geospatial data interchange format based on JavaScript Object Notation (JSON).

    • Leaflet is an open-source JavaScript library for mobile-friendly, cross-browser, interactive maps.

    • A Web Map Service (WMS) is a standard protocol for serving georeferenced map images over the Internet that are generated by a map server using data from a GIS database.

      • See also Web Feature Service (WFS)


Data formats

Data can be stored in different formats.


JSON structure

JSON stands for JavaScript Object Notation and it is based on the JavaScript Programming Language Standard ECMA-262.

JSON is built on two structures, namely a collection of name/value pairs, and an ordered list of values. A JSON structure looks like:

Object {
        Identifier: Value
        Identifier: Array [
           Object {
                   Identifier: Value
          }
        ]
}

Where an Identifier is delimited by quotes, and a Value can be a string, a number, "true", "false", "null", or an Array or another JSON Object as the above example.

JSON in R

Some R packages for reading JSON files in CRAN are

  • rjson v0.1.0 released on Jul 30 2007

  • RJSONIO v0.3-1 released on Oct 4 2010

  • jsonlite v0.9.0 released on Dec 3 2013


eXtensible Markup Language

Todo

eXtensible markup language (XML) structure


Lightweight markup languages

Lightweight markup languages are for producing documentation on the Web.


Markdown

Markdown (MD), with suffixes .md, .Rmd, etc., is currently the markup language for GitHub, and hence very popular among developers using this platform. The popularity of this format for writing for the web is however challenging its consistency and robustness, and today there are several flavours of MD:

  • Basics and syntax of the “Gruber Markdown” are in the creator’s webpage

  • CommonMark is an extension of the Gruber Markdown by users including representatives from GitHub, Stack Exchange, and Reddit, and therefore today “de facto” standard on the Web.

  • Github Flavored Markdown or GFM is a superset of CommonMark with Github-specific extensions on syntax features.

  • Other flavours of Markdown include MultiMarkdown, Markdown Extra, CriticMarkup, Ghost Markdown, and others…


reStructuredText

reStructuredText (RST) is written with the suffix .rst or .txt since is plaintext, which use simple and intuitive constructs to structure complex technical documentation. Here “complex” means things like indexing, glossaries, etc.

One significant innovation of Markdown was the use of headers and interpreted text. However, a step further of RST over MD is the use of directives and specialized roles. For example, these features allow reStructuredText rendering text and math formulae directly into LaTeX format.

The directive syntax in RST is

.. directive-type :: directive
block

and an illustration of a standard and specialized role is

*emphasis* as standard role
:title:*emphasis* with explicit role

where (most) of standard roles are common for interpreted text in MD and RST.

In order to produce a documentation, either in HTML or in LaTeX, reStructuredText needs a builder, which is a program that convert the RST source code into the desired format.

Popular builders are the Python package docutils with different options:

prompt> ./rst2html.py text.rst > text.html
prompt> ./rst2latex.py text.rst > text.tex

where RST sources are in a source folder and constructs go into a build folder.

Another alternative is Sphinx that constructs the API documentation with the two folders and perform the transformation afterwards.

prompt> ./sphinx-build [options] html source build
prompt> ./sphinx-build [options] latex source build

TeX and LaTeX

First released in 1978, TeX is a format that allows typesetting complex mathematical formulae. TeX is also the engine or program that does the typesetting.

LaTex is a generalised set of macros built on top of TeX to take care of the content of the document.

Todo

TODO


Another data format

Todo

TODO Another data format