6.1. File Layout Markup Language Specification#
- Authors:
Benjamin Fang
- Version:
1.2.0
- Create Date:
20230401
- Update Date:
20240130
6.1.1. Introdution#
File Layout Markup Language (FLML) is a markup language for describing the layout/structure of binary or plaintext text files. It is simple and accurate in representing the layout of a file.
For example, we have a binary file whose layout is:
3 integers; 1 char, whose value is 255; 1 integer which have a value "X"; floats repeated "X" times.
The corresponding FLML is following:
[3] <int> (dsp="three integers")
[1] <char; =255> (dsp="one char, whose value equals to 255")
[1] <int; $x> (dsp="one int, the value is assigned to a variable $x")
[$x] <float> (dsp="$x float number")
Here is another example of plaintext file:
ID gender name point
12 0 ben 2.32
8 1 lewis 5.6
The corresponding FLML is:
[4] <string; =["ID", "gender", "name", "point"]> (sep="\s"; end="\n")
[$+] {
[1] <string> (dtype=int, end="\s")
[1] <string, ={"0", "1"}> (dtype=bool, end="\s")
[1] <string; ="[a-zA-Z]+"> (re=True, dsp="names", end="\s")
[1] <string> (dtype=float)
} (end="\n")
A FLML file is consist of sentences, and each sentence used to descirbe a
block. A block is made of one or more same elements. Generally,
a FLML sentence has three components, There are
enclosed by square bracket, angled bracket and round parenthese respectively.
Namely, The tree components is element number component,
element unit component and element label component.
The element number component is used to descirbe the number of element unit.
The element unit is the unit of the block, and the element unit
can be complex unit. A complex element unit is enclosed by curly brackets.
For example:
[3] {
[1] <int>
[2] <float>
}
The unit is outer most sentence has one int and three float. The
complex element unit is
descirbed by sample element unit in the curly brackets.
Using a modified BNF grammar notation. Which can be defined as:
flml-description ::= flml-sentences +
flml-sentences ::= "[" square-bracket-part "]" ( "<" angled-bracket-part ">" | "{" flml-sentences "}" ) "(" round-parenthese-part ")"
6.1.2. Terminologies and Concepts#
- File
A binary or plaintext file.
- Block
A consistent data of a file.
Element length
Element type
Element label
- statment
A statament in FLML is a expression end by “;”. If the statament is last one of a sentence part. the “;” can be omiited.
- sentence
A FLML sentence looks like
[statment]<statment>(statment)or[statment]{sentences}(statament). A sentence is have tree sentence parts, the first one is called “square bracket part”, which include the “[]” marker and statments it containing. The second is called “angled bracket part” or “curly bracket part”. The last is called “round parenthese part”.- block
A block is the uint which construct the further data structure. For instance,
[8] <int> ()(example A), where the “int” is the block, which is inclose by a “<>” parenthese. The main function of “angled bracket part” and “curly bracket part” is to contain block.- sample block and complex block
The block can de divided into two tipies: sample block and complex block. A sample block is a basic data type which have beed define in this language, which can not consist of other blocks. For example, the “int”, “float”, “char” all are sample blocks. The sample block was enclosed by “<>”. The complex block, on the other hand, is made up of sample blocks. For example,
[3]{[1]<int>() [1]<float>()}()(example B). The complex block in the example is consist of one int and one float. The complex block is enclosed by “{}”- block type
There many kinds of sample block type, each type reprent the its data type as well as data size. For example, A “uint64” sample block meant that the data is a integer and it consums 64 bits.
- block size
For a given block, no matter it is a sample block or complex block, the size of it is decided. that is the size of block, or in term, block size. For the example I given above, the block size of “{[1]<int> [1]<float>}” is 8 bytes (here we suppose the size of int is 4 bytes).
- block multiplier
There is a number or variable in “[]” to indicate the amount of block. For example A which given above, “[8]” mean there are 8 “<int>”. The number “8” here is a block multiplier, which use to represent the repeated time of the block.
- segment, segment length, elements of segment
The block multiplied by multiplier of same sentence makes a segment. For example A,
[8]<int>()make a segment, which have 8 int, the the size is 32 bytes. The block makes a sagments also called the elements of segment. The multiplier also termed the length of segment or segment length.
6.1.3. Variables and Data Types#
There are two data types in FLML, one is scaler and the other is array. scaler can refer to number, file, and iterater, and a order element. On the other hand, array is a collection of scaler.
A scaler variable, which is used to reprent a scaler, is start with “$”, and a array variable is start with “@”.
Here are some examples.
$sca = 3;
@arr = [1, 2, 3, 4, 5];
$sca = @arr[0]; // $sca equal 1
@arr[:2] = [7, 8];
Array can be indexed and sliced, “@arr[0]” refers to the first element of the array, while “@arr[:3]” refers to a range form the first to the third.
In BNF:
variable ::= "$" [a-zA-Z*+?] + [0-9]* | "@" [a-zA-Z]+ [0-9]*
6.1.4. Operators and Expressions#
The operator of FLML include + - * / : ~ ^ = The “+ - * /” is same as normally
itself in algebra. For example:
$foo = 1 + 3; // $foo equal 4
$foo = 4 - 3; // $foo equal 1
$foo = 4 * 3; // $foo equal 12
$foo = 5 / 2; // $foo equal 2.5
There are five operator in FLML, they are “+ : ~ ^ =”. They have sepecial meaning in certain context.
A expression of FLML is consist of variables and operators. and a expression end with a “;” make a statament.
In BNF:
statament ::= expression ";"
expression ::= (operator)? (variable | number) (operator expression)?
variable ::= "$" [a-zA-Z*+?] + [0-9]* | "@" [a-zA-Z]+ [0-9]*
number ::= [1-9]+ "." [1-9]
operator ::= [+-*/:~^]
6.1.5. FLML sentences#
Square bracket part#
square-bracket-part is the first part of FLML sentence, which mainlly used to describe the number of block.
This part is made of statment enclosed by “[]”. The part have four types of stetments.
A statament indicate the number of block
This statament is a expression, the value of the expression is number of block, In Terminology, this value is the multiplier of block or length of the segment.
For example:
[3] <byte> ()
[%let $num = 5] <> ()
[$num * 2] <float> ()
For the first sentence in the example above, the block is “byte”, and multiplier is 3. which make a segment of 3 bytes. The second sentence defined a variable, whose value is 5. And in the third sentence, the statament in square bracket part is a expression having a value 10, The the multiplier is 10, the segment is 10 floats sagment.
Iteration operator and iteration statament.
Along with multiplier, there can be a iteration statament. which made of “~” followed by variable.
For example:
[3; ~$i] {
[$i] <float> ()
[2] <int> ()
} ()
In the example, The “~$i” is a iteration statament, The $i will iterated from 0 to 3 in its element. The block of sentence is complex block, the complex is descirbed by two sentence, The segment have 3 block, the first block is made of 0 float 2 integers, and second is made of 1 float 2 integers. The third is made of 2 floats 2 integers.
Order collecting operation and order collecting statament.
Some time the order of a sequece is importand and the order may be aligned by following segments.
For example:
[10; ^@myorder] <string> ()
[10] <int> (alignwith=@myorder)
[10; ~$i] {
[1] <float> (order=@myorder[$i])
}
statament of FLML operation
This kind of statament is operation of FLML, such as declear a variable, branch and loop and so on.
For example:
[%let $var = 3]
[%if $var == 2] {
[1] <int>
}
Note
multi FLML statement can be writren within one square bracket.
In modified BNF, it can be descirbed as:
square-bracket-part ::= (expression (";" "~"variable)? (";") "^"variable ) | other statament
Angled bracket part#
angled-bracket-part is mainlly used to offered block information. It also have
some additional stataments.
a string represent block tpye.
- For example::
[1] <float> // block type is float [1] <uint32> // block type is int, whose size is 4 bytes
A statament only have a variable.
For example:
[1] <int; $int_value> // value of this block is stored in $int_value
[3] <float; @float_values> //this segment have 3 float, the values of those floats were stored in @float_values
If the length of segment is one, the data type of variable should be scaler, otherwise, it should be a array.
There are a typea operator can be applied to this variable: accumulating operator “+”.
“+” will keep the value already stored by the variable, and add the new value up to the original.
For example:
[10] {
[1] <int; +$sum>
}
This will add 10 value to $sum.
Assign a value to the block
We can assign one or more value to a segment.
For example:
[1] <int; =2>
[4] <int; =[1,2,3,4]>
[%let $a = 5]
[%let @b = [1, 2, 3]]
[1] <int; =$a>
[3] <int; =@b>
A choices of block.
For example:
[8] <char; =0> (dsp="this segment has 8 blocks, and the value of block is 0")
[4] <int; ={0, 1}> (dsp="this segment have 4 int, the value of block should be either 0 or 1")
In modified BNF:
angle-bracket-part ::= block-type (";" variable)? | (";" "+"variable) (";" ("=" | "=:") variable)? (";" "=" choices | range | value_list)?
choices ::= "{" elements "}"
range ::= "(" ("(" | "[") range-start "," range-end ("]" | ")" ) ")"
value_list ::= "[" elements "]"
elements ::= variable ("," variable)*
curly-bracket-part#
When the block is not a sample block type, such as int, float and so on, instead
it is some other segment. the curly bracket is used to contain those segment. The
other applicaiton of curly-bracket-part is used for complex statments like [%if 1]{}().
used when block is a segment.
For example:
[6] {
[2] <bit> ()
[3] <int> ()
} (dsp="the block is sagment, the sagment is 2 bits and 3 int")
used when a complex statment introduced.
For example:
[%for $i = 0; $i < 10; $i++] {
[$i + 1] <int> ()
} (dsp="$i changed from 0 to 9")
By the way, this example can be replace by other way:
[10; ~$i] {
[$i + 1] <int> ()
} ()
round-parenthesis-part#
round-parenthesis-part contain labels that used to descirbe the segment or block.
For example:
[1] <char; =2> (dsp="this is a example"; value="1 for fou, 2 for bar"; name="example-segment")
The lable is pre-defined by FLML, the user can define label themself by [%deflabel mylabe "this is my label"]<>() too.
In modified BNF:
description ::= label-name "=" '"' value '"' (";" label-name "=" '"' value '"') *
label-name ::= [a-ZA-Z] +
value ::= [a-zA-z\s] +
6.1.6. Declearation of new variable#
“%let” can be used to declear a new variable. For example:
[%let $a = 3]
The new declear variable can initiated like what we do in example.
- A variable can auto declear when it show up first time. For example::
[1] <int; $bar>
The variable “$bar” is decleared and the value of the block is assigned to it.
6.1.7. Branch#
The Branch in FLML used key words %if %ifel %else.
The usage is:
[%if expression] {
sentences
} ()
[%elif expression] {
sentences
} ()
[%else] {
sentences
} ()
6.1.8. Loop#
The “for” loop
The usage of for statment is:
[%for expression_a; expression_b; expression_c] {
sentences
} ()
The for loop is just like C’s.
For example:
[%let $sum = 0] <> ()
[%for $i = 0; $i < 10; $i ++] {
[$sum += $i] <> ()
} ()
The “while” loop
The usage of while loop:
[%while expression] {
statments
} ()
6.1.9. Function#
The way to define a function:
[%deffunc $funname (arguments) returns] {
sentences
} ()
Here is an example:
[%deffunc $myadd ($a, $b) $c] {
[$c = $a + $b] <> ()
[%return $c] <> ()
} ()
The [%return] can be omitted.
6.1.10. Comment#
comment like C language.
The comment in C style is acceptable.
Here is example:
[1] <int> () //here is a comment
//[3] <int> ()
/*
[3] {
[5] {
[5] <float> ()
} ()
} ()
/*
segment comment.
“#” can be used for segment comment, to comment a segment.
For example:
[# 10] {
[1] <int> ()
[1] <float> ()
} ()
6.1.11. Omission#
A FLML must have a square bracket part. The angle bracket part and round parenthesis part can be omiited if they have no contents.
Examples:
[%let $sum = 0]
[%for $i = 0; $i < 10; $i++] {
[$sum += $i]
}
6.1.12. “ “ and ‘ ‘ in FLML#
“” and ‘’ can be used to parenthesis a string. The difference between them is that the variable within “” would be extended, the other is not. The specifier like “n”, “t” would refer to a new line and tab respectively too.
For example:
[%let $var = 3; %let @arr = [1, 2, 3]]
[%mesg "\$var is $var"] //the mesg is: $var is 3
[%mesg 'this is @aarr'] // the message is: this is @arr
6.1.13. Appendix#
Key words#
All key words of FLML begain with “%”.
%let
Declear a variable and initiate it.
[%let $var = 12] [%let @arr = [1, 2, 3]]
%if %elif %else
Those three key words is used in loop.
..code
[1] <int; $var> [%if $var > 10] { [10] <int> } [%elif $var == 10] { [5] <int> } [%else] { [1] <int> }%for
To construct for loop sentence.
[%let $var = 10] [%for ($i = 0;$i < 10; $i += 1)] { [$var] }If no other stataments, the parenthesis of “%for” can be omiited.
%while
To make whild loop sentence.
[$let $var = 10; %let $summ = 0] [%while $var > 0] { [1] <int; +$summ> [$var -= 1] }%break %continue
Those key words used in loop.
%assert
Assert a statament.
[%assert $var == 0]
%error
Give error information.
[%error "this is a error"]
%mesg
Give a message.
[%mesg “this a message”]
%deffunc %return
When use “%deffunc” to define a function, all “[]” can be omitted. The arguments of function put into a parenthesis and saperated by commer. Then the variable will be return followed the arguments. The “%return” statament can be omiited. function should be defined before refered to. You can declear the function first and then define it later like C language.
[%deffunc %myfunc ($var_a, $var_b) $data_out] [%let $a = 13; %let $b = 14; %let $c = $myfunc($a, $b)] [$mesg "the value of \$c is $c"] [$c]<float> [%deffunc %myfunc ($var_a, $var_b) $data_out] { %let $c = $var_a + $var_b; $data_out = $c; %return %data_out; // can be omitted }%info
Give information, Generally, use it to offer information about whole file.
[%info](dsp="a binary file"; filetype="binary"; endianness="little")
%file
declear a variable which refer to a file.
[%file $file_var "file description" "file_name"]
The “file name” can be omiited.
%parse
To parse an array.
[100]<byte; @data_a> [%let @data_b = %transform(@data_b)] [%parse @data_b] { sentences }
The original data in the file maybe need some transform and the transformed data have acctual meaning. When is the time “%parse” works.
%deflabel
Used to define a new label user itself.
[%deflabel newlabel "this is a new label used to express new attribute"]
Block type#
integer
The block type of integer include:
<int8> <uint8> <char> <int16> <uint16> <short> <int32> <uint32> <int> <int64> <uint64> <long>
float
<float> <float32> <float64> <double>
bytes
<byte>
bit
<bit>
Plaintext.
<char> <string> <ascii>
the
<ascii>was used to reprent asscii code, the block/unit consums 1 byte.
Built in functions#
$abs
%let $a = -2; %let $b = $abs($a); // $b equal 2
$floor
%let $a = $floor(10 / 3); // $a equal 3
$ceil
%let $a = $ceil(10 / 3); // $a equal 4
$mod
%let $a = $mod(10, 3); // $a equal 1
$sum
%let @arr = [1, 2, 3]; %let $ss = $sum(@arr); // $ss equal 6
$append
%let @arr = [1, 2, 3]; %let $a = 4; $append(@arr, $a); // @arr is [1, 2, 3, 4]
$pop
%let @arr = [1, 2, 3]; %let $a = $pop(@arr); // @arr is [1, 2], $a equal 3
$length
%let @arr = [1, 2, 3]; %let $a = $length(@arr); // $a is 3
$getorder
Get the order of a file or array.
%file $test_file "a test file" %let @order = $getorder($test_file); // @order represent the order of file.
$filelinenum
Return the line number of a plaintext file.
$filesize
Return size of file.
Standard lables#
dsp
Description of segment. This label is used for general popurse and have no limitation. The value is a string.
dsp="string"
ele-dsp
Description the element of segment. The value is string.
dsp="string"
value-dsp value
Description the mean of each value.
[1] <char; ={0, 1, 2}> (value-dsp="descripiton of value"; value={0: "dsp one", 1: "dsp two", 2: "des three"})
NA
Value to indicate NA.
name id
name of segment.
filetype
File type, vlaue is “binary” or “plaintext”.
endianness
endianness of file, value is “little” or “big”.
alignwith order
The order which the block refer to.
[%file $myfile "my file"] [%let $filelen = $filelinenum($myfile)] [@let @order = $getorder($myfile)] [1] <int> (order=@order[0]) [$filelen] <float> (alignwith=@order)
value-alignwith
relatedto
datatype
Used in plaintext descripiton, reprent the data type of block.
sep
Used in plaintext descripiton, the seperator between elements of segment.
end
Used in plaintext descripiton, reprent the end of segment.
encode
Used in plaintext descripiton, reprent the encoding type of plaintext.
re
Used in plaintext descripiton, indicate whether the regular express is used or not.
role
bitorder
BitBlockOri
Specicial variable#
$*
This variable refer to a range [0, infinity).
$+
This variable refer to a range [1, infinity).
$?
This variable refer to a value, which is 0 or 1.
$NA $NONE $UNKNOW
The variable means that the value is not known.
$WHITESPACE
Refer to “s” or “t”.
$EOF
Refer to End Of File.
$NEWLINE
Refer to “n”.
$TAB
Refer to “t”.
@CMDARGS
Refer to a array, which store arguments of command line. This is defined for future usage.
$INF $INF_POS $INF_NEG
Refer to a infinity value.
$TRUE
Refer to true.
$FAUSE
Refer to false.