This public licence allows you to:
And you should:
A registered UK charity (Reg # pending) run by Bible scholars and computer enthusiasts, as well as members who help decide priorities.
The datasets are based on work by scholars at Tyndale House - an international Biblical Studies research institute in Cambridge, UK (see www.TyndaleHouse.com)
The repository aims to provide reliable and freely usable data for studying the Bible without any denominational or doctrinal bias. Much of the data is derivative from other publically licenced sources, and has been compared with other non-public sources so that differences can be checked by Tyndale scholars. Corrections and proposed updates are welcomed - please send them to STEPBibleATgmail.com for checking.
The data is available as downloadable tab-separated text files (see notes on the data format below). The following datasets are already posted
TTESV - Tyndale Translation tags for ESV
Tags for Greek & Hebrew Extended Strongs (compatible with original Strongs) for the translated text of the ESV.
TOTHT - Tyndale OT Hebrew Tagged text
The Leningrad codex based on Westminster via OpenScriptures, with full morphological and semantic tags for all words, prefixes and suffixes. Semantic tags use the extended Strongs linked to BDB by OS, is backwardly compatible with simple Strongs tags and includes all affixes (as defined in TBESH). Morphological tags are from ETCBC converted to the format of OS (similar to Westminster) with different morphology for Ketiv/Qere when needed.
TAGNT - Translators Amalgamated Greek NT
Greek text created from the SBLGNT+apparatus, following the decisions made by NA28, listing the major editions that also use that form (SBL, Treg, TR, Byz, WH, NA28). Variants are being added from major editions plus the 1st 4 centuries of MSS (from Bunning). All words are tagged lexically (extended Strong linked to LSJ) and morphologically (Robinson based on Tauber plus a few missing details) plus context-sensitive meanings for words with more than one meaning. For copyright reasons, any words, variants or punctuation that occur only in NA27 and/or in NA28 are omitted, so that this data cannot be used to reconstruct those texts.
TBESH - Tyndale Brief lexicon of Extended Strongs for Hebrew
Abridged BDB linked to extended Strongs (compatible with OpenScriptures and backwardly compatible with original Strongs)
TBESG - Tyndale Brief lexicon of Extended Strongs for Greek
Brief definitions for all Greek Bible words (NT, LXX, Apoc, & variants) using corrected Abbott-Smith when available, completed with other similar definitions. Backwardly compatible with original Strongs.
TFLSJ - Tyndale Formatted full LSJ Bible lexicon
Full LSJ entries for all Bible words (NT, LXX, Apoc & variants), formatted for easy reading (all bibliographic data hidden as hover-text) linked to extended Strongs (backwardly compatible with original Strongs).
TIPNR - Tyndale Individualised Proper Names with all References
Every name in the Bible, linked to all Hebrew & Greek forms of that name and separated into individual people & places. Each form of the names for each individual includes exhaustive refs for where that individual is named with data of their spouses, siblings and offspring or the places’ geolocation (based on OpenBible).
TVTMS - Tyndale Versification Traditions with Methodology for Standardisation: Eng+Heb+Lat+Grk+Others
All the versification differences in the OT traditional texts in Hebrew, Latin and Greek, and NT early versification, compared with English standard (defined by NRSV which is virtually identical to KJV). Bible translations have an almost infinite variety of versifications because they may follow (for example) Latin in several sections, Hebrew in a few and English most of the time. The Methodology provides simple rules for every section, such as “if this chapter has 29 verses, it is using Greek versification”. Using this, a whole Bible can be reversified according to English or traditional Hebrew or Greek or Latin versification, or compared with Bibles using that versification.
TEHMC - Tyndale Expansion of Hebrew Morphology Codes
Hebrew morphology codes with expanded explanations in terms of parsing, meaning and example. The codes are based on OpenScripture which is similar to the Westminster code system used in BibleWorks and other commercial software. They include extra codes which occur in STEPBible data which distinguishes sequential perfectives, gentilics, gender/location for personal pronouns, and non-Jussive/Cohortative as well as Jussive/Cohortative & possibly-Jussive/Cohortative forms.
TEGMC - Tyndale Expansion of Greek Morphology Codes
Greek morphology codes with expanded explanations in terms of parsing, meaning and example. The codes are based on Robinson, developed for the Majority text and used in most open-source texts. They include extra codes which occur in STEPBible data which distinguishes persons in possessive and reflexive pronouns, 2nd forms of verbs, and distinctions between deponant forms and ambiguous passive/middle.
The followins datasets are still being finished and/or being checked. If you see data that you have need of which isn’t yet available, please contact us and perhaps you can become part of the checking process.
TOTGT - Tyndale OT Greek Tagged text
LXX text with later Ecclesiastical variants. The base text is Rhalfs with variants from the Apostolic Bible (based on Sixtine, Aldine and Complutensian texts). Both have been tagged to LSJ (compatible with extended Strongs) and most of morphology has been tagged (based on CCAT) but variant tagging need completing.
TFBDB - Tyndale Formatted full BDB lexicon
Full BDB formatted for easy reading (all bibliographic data hidden as hover-text) linked to extended Strongs (compatible with OpenScriptures and backwardly compatible with original Strongs)
TOTMM - Tyndale OT Manuscripts and Meanings
Translation, Hebrew form and witnesses for each variant that affects the meaning of the text, as determined by Barthélemy’s UBS committee. Also, alternate meanings found in standard translations. Shown as alternate renderings of a base text (ESV 2011).
TNTMM - Tyndale NT Manuscripts and Meanings
Translation, Greek form and witnesses up to 400 AD for each variant that affects the meaning of the text, as determined by the UBS apparatus. Also, alternate meanings found in standard translations. Shown as alternate renderings of a base text (ESV 2011).
Data is in plain unicode text (UTF-8) with fields separated by tabs, so that they can be loaded into any text editor or spreadsheet.
To open in spreadsheets, (e.g. Excel): In Github, click on the file, then “Download” then Save (Ctr+S) to your drive. In Excel “Browse” for it using “All Files” (not “All Excel Files”) and open it. When asked, select “Unicode UTF8”, “Delimited”, “Tab”, “General”.
By default, datasets are one-line records, so a Record ends with a NewLine, and each line has identical fields.
Some datasets have multi-line records. Records are separated by a line starting with “$”. The first line is a Header with fields that apply to each subsequent subRecord line. SubRecord lines all start with a tab.
For example, in the ProperNames dataset, the first line is a header with information about the type (individual, place, title etc) and other data. These details apply to each of the subsequent subRecords which contain fields for the specific tag, Hebrew/Greek, translation, and the list of references. So the Header effectively contains fields which belong to each of its subRecords and would be identical for each of them if they were included on each line.
Glyphs NOT used for Greek include:
; ‘ ᾿ ` ῾ ’ ‘ ‛ ′ ΄ ʹ̛̀́̓̒̓̔̕ ʹ ʻ ʼ ʽ ʾ ʿ ˈ ˊ ˋ ‘ ` ´ o ά ὰ ᾷ ἀ Ἀ ἁ Ἁ ἄ Ἄ ἅ ἂ ἃ ᾶ ᾳ ἆ έ ὲ ἐ Ἐ ἑ Ἑ ἔ Ἔ ἕ Ἕ ἓ ή ὴ ῇ ἠ Ἠ ἡ Ἡ ἤ Ἤ ἢ ἥ Ἢ ἣ ᾗ ῆ ῃ ῄ ἦ Ἦ ᾖ ἧ ᾐ ᾑ ᾔ i ί ὶ ϊ ΐ ῒ ἰ Ἰ ἱ Ἱ ἴ Ἴ ἵ Ἵ ἳ ῖ ἶ ἷ ό ὸ ὀ Ὀ ὁ Ὁ ὄ Ὄ ὅ ὂ Ὅ ὃ Ὃ ῥ Ῥ ύ ὺ ϋ ΰ ῢ ὐ ὑ Ὑ ὔ ὕ ὒ ὓ ῦ ὖ ὗ ώ ὼ ῷ ὠ ὡ Ὡ ὤ Ὤ ὢ ὥ Ὥ ᾦ ᾧ ᾯ ῶ ῳ ῴ ὦ Ὦ ὧ Ὧ ᾠ ϛ
Please report all errors at STEPBible.FeedbackATgmail.com See Current reported errors