VTT File Format
Format Information
- Format version: 2010.0
- Build dates: 02.01.2010 ~ current
- What is new:
- Meta data section:
- Add default tags file for quick loading
- Add file (save) history: VTT format version, user, time stamp
- Meta data section:
VTT opens and saves tagged text files in VTT file format. Two types of file format will be displayed (opened) in VTT:
- Pure text
- Correct VTT format
I. Meta Data
Meta data are included in the VTT file in this version. Meta data are stored in the VTT file with preserved tag and format:
- The default tags file for quick loading
- The file (saving) history: VTT file format version, user name, time stamp
- Separated header for Meta Data Content:
#<----------------------------------------------------------------------> #<Meta Data> #<Tags file|confirmation|path of default tags file> #<File History|VTT file format version|User name|Time stamp> #<----------------------------------------------------------------------> ...
- Preserved tags and format:
- TAGS_FILE:
Field 1 Field 2 Field 3 Description Preserved Tag confirmation flag path of default tags file Java Type String boolean String Example TAGS_FILE - true
- false
- /usr/vtt/data/tags.data
- FILE_SAVE:
Field 1 Field 2 Field 3 Field 4 Description Preserved Tag VTT file format version User name Time stamp Java Type String String String String Example FILE_SAVE VTT.2010.0 VTT Guest - 2/4/10 11:57:43 AM
- mm/dd/yy hh:mm:ss
- TAGS_FILE:
II. Text Content
- The original not-tagged text in UTF-8
- A line starts with # is ignored and used as a comment
- Separated header for Text Content:
#<----------------------------------------------------------------------> #<Text Content> #<----------------------------------------------------------------------> ...
III. Tags Configuration
- Each line represents a Tag configuration used in VTT
- Each line must contain all 14 fields in the correct format and legal value
- A line starts with # is ignored and used as a comment
- The first tag is reserved (reserved tag) by VTT
- It's name is pre-defined as Text/Clear
- It's Bold|Italic|Underline properties are used for clear markup
- It's Display property is not used
- It's foreground and background colors are used for the high-light color
- A tag id uniquely define by name and category in VTT
- Separated header and reserved (the 1st) tag for Tags Configuration:
#<----------------------------------------------------------------------> #<Tags Configuration> #<Name|Category|Bold|Italic|Underline|Display|FR|FG|FB|BR|BG|BB|FontFamily|FontSize> #<----------------------------------------------------------------------> Text/Clear||false|false|false|true|255|255|255|0|51|153|Monospaced|12 #<----------------------------------------------------------------------> ...
- Tag Fields Format:
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6 Field 7 Field 8 Field 9 Field 10 Field 11 Field 12 Field 13 Field 14 Description Name Category Bold Italic Underline Display Foreground-Red Foreground-Green Foreground-Blue Background-Red Background-Green Background-Blue Font Family Font size Java Type String String boolean boolean boolean boolean int int int int int int String String Example Text/Clear - true
- false
- true
- false
- true
- false
- true
- false
0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 0 ~ 255 - Dialog
- DialogInput
- Monospaced
- SansSerif
- Serif
- 8
- 10
- 12
- 14
- ...
- +0
- +2
- -2
IV. Markups Information
- Each line represents a Markup information used in VTT
- A line starts with # is ignored and used as a comment
- Each line must contain the correct format/data for Fields 1 ~ 5
- Field 1 (Offset) & field 2 (Length): must be integer (< 2147483647)
- The combination of field 3 (tag name) and field 4 (tag category): must exist from the tags list
- field 5: Annotation can be empty
- Spaces in the beginning and end are trimmed in Fields 1 ~ 5
- Fields 6 is the tagged text, which is used for NLP purposed and is ignored in VTT
- Fields 6+ are ignored in VTT and can be used for other NLP purposes
- Character "|" is not allowed to used in the first 5 fields
- No two lines should have same offset and length (A word can only markuped with one tag)
- All lines are sorted by offset (smaller first) and then length (larger first)
- Separated header for markups information:
#<----------------------------------------------------------------------> #<Markups Information> #<Offset|Length|TagName|TagCategory|Annotation|TagText> #<----------------------------------------------------------------------> ...
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6 More Fields Description Offset Length Tagged name Tag Category Annotation Tagged Text Other NLP fields Java Type int int String String String Not used Not used