Digital Humanities¶
The use of digital technologies to pursue research questions in the humanities.
Data for humanities¶
Document markup languages
ConTeXt a TeX macro package that has a cleaner interface to control typography of the document while retaining LaTeX’s structure-oriented approach
with separation of content and presentation, it can format XML text, …
EAD (Encoded Archival Description)
…
Citation
Arts and Humanities Citation Index (AHCI)
Machine-readable bibliographic record - MARC, RIS, BibTeX
Geospatial/geographical data
GeoJSON is a geospatial data interchange format based on JavaScript Object Notation (JSON).
Leaflet is an open-source JavaScript library for mobile-friendly, cross-browser, interactive maps.
A Web Map Service (WMS) is a standard protocol for serving georeferenced map images over the Internet that are generated by a map server using data from a GIS database.
See also Web Feature Service (WFS)
Data formats¶
Data can be stored in different formats.
JSON structure¶
JSON stands for JavaScript Object Notation and it is based on the JavaScript Programming Language Standard ECMA-262.
JSON is built on two structures, namely a collection of name/value pairs, and an ordered list of values. A JSON structure looks like:
Object { Identifier: Value Identifier: Array [ Object { Identifier: Value } ] }
Where an Identifier
is delimited by quotes, and a Value
can be a string
, a number
, "true"
, "false"
, "null"
,
or an Array
or another JSON Object
as the above example.
JSON in R¶
Some R packages for reading JSON files in CRAN are
rjson
v0.1.0 released on Jul 30 2007RJSONIO
v0.3-1 released on Oct 4 2010jsonlite
v0.9.0 released on Dec 3 2013
Lightweight markup languages¶
Lightweight markup languages are for producing documentation on the Web.
Markdown¶
Markdown (MD), with suffixes .md
, .Rmd
, etc., is currently the markup language for GitHub,
and hence very popular among developers using this platform. The popularity of this format for writing
for the web is however challenging its consistency and robustness, and today there are several flavours of MD:
Basics and syntax of the “Gruber Markdown” are in the creator’s webpage
CommonMark is an extension of the Gruber Markdown by users including representatives from GitHub, Stack Exchange, and Reddit, and therefore today “de facto” standard on the Web.
Github Flavored Markdown or GFM is a superset of CommonMark with Github-specific extensions on syntax features.
Other flavours of Markdown include MultiMarkdown, Markdown Extra, CriticMarkup, Ghost Markdown, and others…
reStructuredText¶
reStructuredText (RST) is written with the suffix .rst
or .txt
since is plaintext, which use simple and intuitive
constructs to structure complex technical documentation. Here “complex” means things like indexing, glossaries, etc.
One significant innovation of Markdown was the use of headers and interpreted text. However, a step further of RST over MD is the use of directives and specialized roles. For example, these features allow reStructuredText rendering text and math formulae directly into LaTeX format.
The directive syntax in RST is
.. directive-type :: directive block
and an illustration of a standard and specialized role is
*emphasis* as standard role :title:*emphasis* with explicit role
where (most) of standard roles are common for interpreted text in MD and RST.
In order to produce a documentation, either in HTML or in LaTeX, reStructuredText needs a builder, which is a program that convert the RST source code into the desired format.
Popular builders are the Python
package docutils
with different options:
prompt> ./rst2html.py text.rst > text.html prompt> ./rst2latex.py text.rst > text.tex
where RST sources are in a source folder and constructs go into a build folder.
Another alternative is Sphinx
that constructs the API documentation with the two folders and perform
the transformation afterwards.
prompt> ./sphinx-build [options] html source build prompt> ./sphinx-build [options] latex source build
TeX and LaTeX¶
First released in 1978, TeX
is a format that allows typesetting complex mathematical formulae. TeX
is also the engine or program
that does the typesetting.
LaTex
is a generalised set of macros built on top of TeX
to take care of the content of the document.
Todo
TODO