Sunday, December 5, 2010

Summary of Data Elements in GED file

For those who are interested, here is some of the information that was in the GEDCOM file that was tested to date:

Level 0 530
Level 1 4648
Level 2 3824
Level 3 2559
Level 4 1604
Level 5 246
Lines 13411

CONT 1390
EVENT 212
FAMC 144
FAMS 162
NAME 305
NOTE 195
SEX F 114
SEX M 112
SOUR 1648

The Level listed about is the first column of information that is in the GEDCOM file.

13,411 Lines of data were in the GEDCOM file.

The second group of numbers were a count of some of the TAGS, or second column of data, in the file.

5 comments:

lkessler said...

Russ,

How about making that test GEDCOM available to others. I'd like to test with it and see how my numbers compare.

Louis

Russ said...

Louis,

I would prefer not, as this is real data, with personal information in it.

I have been thinking about creating a GEDCOM file so that it could be shared.

Thank you,

Russ

lkessler said...

Russ,

Those statistics you are giving are very interesting, but unless the rest of us have a way of delving into the data to see the "whys and wherefores", they don't help us much.

Why not do it with a publicly available dataset? How about picking something from: http://gedcomlibrary.com/gedcoms/ or find one with certain contents using a google search, e.g. http://www.lkessler.com/gedcom.shtml

Louis

Randy Seaver said...

Are there standard "test" GEDCOMs available that software companies use? Or perhaps one of the FamilySearch Community Trees could be used. They are all historical records.

What is the ideal size for a test GEDCOM file? Seems to me something like 1,000 or 2,000 people would have enough variations to test almost every item in the GEDCOM dictionary.

Russ said...

Randy,

These tests were only illustrate an issue. Perhaps some standard test files could be created and used. This test file was a demo file on the version 11 CD of Family Tree Maker.

Thank you

Russ