Wednesday, June 03, 2009

Cover Pages: Genealogical Data and XML

Cover Pages: Genealogical Data and XML

Genealogical Data and XML

This document provides references for some prominent initiatives which have proposed the use of XML for storing and processing genealogical information. A separate document on Markup Languages for Names and Addresses contains information on abstract models and markup models for person, family, name, and related concepts. See also prosopographical research: "an independent science of social history embracing genealogy, onomastics and demography."


* Genealogy/XML Projects
o GedML: [GEDCOM] Genealogical Data in XML
o GEDCOM (Genealogical Data Communication)
o XGenML
o gdmxml
o Genealogical Information Markup Language (GeniML)
o GENTECH Genealogical Data Model
o Genealogical Data Models in the Unified Modeling Language (GDMUML)
o GenXML
o GRAMPS Project
o FamilyML
* General Resources and References: Mailing Lists, Articles, Papers, News

GedML: [GEDCOM] Genealogical Data in XML

GedML provides "a way of encoding genealogical data sets in XML. It combines the well-established GEDCOM data model with the XML standard for encoding complex information. The result is a representation that can easily be converted to and from GEDCOM, but can be manipulated much more easily using standard tools: notably, by using an XSLT processing such as Saxon."

On 12-May-1999, the software was updated to work with SAXON 4.2.

Software provided by Kay (as of 2002-12-27) included four Java classes in source and compiled form:

* GedcomParser: This class implements the SAX2 XMLReader interface, so it pretends to be an XML parser, but actually it is parsing GEDCOM files...
* AnselInputStreamReader: GEDCOM files use a rather unusual character encoding which is not supported by most Java VMs; this class performs the conversion from ANSEL characters to Unicode...
* GedcomOutputter: This is the reverse of GedcomParser; it acts as a SAX2 ContentHandler which serializes a SAX event stream in the form of a GEDCOM file...
* AnselOutputStreamWriter: This is the reverse of AnselInputStreamReader: it converts Unicode characters to ANSEL, and is used to write the output file by the GedcomOutputter...

Kay also provides stylesheets:

* GedcomToXml.xsl performs an identity transformation; if GedcomParser is used as the input parser, the effect is to convert from GEDCOM encoding to XML...
* XmlToGedcom.xsl also performs an identity transformation, but this time it is configured to use GedcomOutputter to produce the output in GEDCOM format...
* GedcomToHtml.xsl produces an HTML rendition of the GEDCOM file; use this as a starting point to display your GEDCOM files in whatever way you want..."

Principal URLs:

* GedML website. Maintained by Michael H. Kay. 2-April-2002 or later.
* Download GedML software. See the file listing. [cache]
* Sample based upon GEDCOM file: kennedy.xml, kenedy.html.
* Contact: Michael H. Kay. Email: home, work.
* Also: GedML Mark 2 (1999)

Earlier references of possible historical value, some URLs broken:

* GedML description. Earlier Website created by Michael H. Kay. [snapshot 2002-12]
* Main GedML Page [local archive copy, 1998-08-21]
* GedML Document Type Definition 1998; [local archive copy]
* Proposed Specification of GedML. By Michael H. Kay. 16 February 1999.
* Rationale for GedML
* Comments on the GEDCOM Future Directions document - an XML proposal. [local archive copy]
* GedML Software - some java applications

GEDCOM (Genealogical Data Communication)

GEDCOM (GEnealogical Data COMmunication) is designed "to provide a flexible, uniform format for exchanging computerized genealogical data... GEDCOM has evolved over 15 years... Although GEDCOM XML is different from traditional GEDCOM both in syntax and underlying logical structure, it is still considered as an evolution of GEDCOM... An important part of GEDCOM is its ability to link records according to family lineage and other data relationships. XML's standard linkage method, using the ID and IDREF attributes, is equivalent to traditional GEDCOM's linkage method and will be used in its place. In traditional GEDCOM, links are bi-directional. For example, a CHIL tag in the FAM record connects a family to a child, and a FAMC tag in the INDI record connects a child to a family. Also, HUSB and WIFE tags in the FAM record connect to INDI records, and in the opposite direction, FAMS tags in the INDI record handle both spouses' connection to a FAM record. To specify a link in both directions is, of course, redundant and unnecessary. Some programs produce traditional GEDCOM with links in one direction, some the other, and some give both. That makes processing GEDCOM from a variety of sources difficult, and where both directions are specified, they may be inconsistent. In GEDCOM XML, all links are unidirectional and can be specified in only one way... In the past, ANSEL has been specified as the preferred character set for GEDCOM; in GEDCOM XML, the UNICODE character set is used."

GEDCOM was developed by the Family and Church History Department of The Church of Jesus Christ of Latter-day Saints.


Post a Comment

<< Home