6.1. File Layout Markup Language Specification#

Authors:

Benjamin Fang

Version:

1.2.0

Create Date:

20230401

Update Date:

20240130

6.1.1. Introdution#

File Layout Markup Language (FLML) is a markup language for describing the layout/structure of binary or plaintext text files. It is simple and accurate in representing the layout of a file.

For example, we have a binary file whose layout is:

3 integers; 1 char, whose value is 255; 1 integer which have a value "X"; floats repeated "X" times.

The corresponding FLML is following:

[3] <int> (dsp="three integers")
[1] <char; =255> (dsp="one char, whose value equals to 255")
[1] <int; $x> (dsp="one int, the value is assigned to a variable $x")
[$x] <float> (dsp="$x float number")

Here is another example of plaintext file:

ID gender name point
12  0   ben 2.32
8   1   lewis   5.6

The corresponding FLML is:

[4] <string; =["ID", "gender", "name", "point"]> (sep="\s"; end="\n")
[$+] {
    [1] <string> (dtype=int, end="\s")
    [1] <string, ={"0", "1"}> (dtype=bool, end="\s")
    [1] <string; ="[a-zA-Z]+"> (re=True, dsp="names", end="\s")
    [1] <string> (dtype=float)
} (end="\n")

A FLML file is consist of sentences, and each sentence used to descirbe a block. A block is made of one or more same elements. Generally, a FLML sentence has three components, There are enclosed by square bracket, angled bracket and round parenthese respectively. Namely, The tree components is element number component, element unit component and element label component.

The element number component is used to descirbe the number of element unit. The element unit is the unit of the block, and the element unit can be complex unit. A complex element unit is enclosed by curly brackets. For example:

[3] {
    [1] <int>
    [2] <float>
}

The unit is outer most sentence has one int and three float. The complex element unit is descirbed by sample element unit in the curly brackets.

Using a modified BNF grammar notation. Which can be defined as:

flml-description   ::= flml-sentences +
flml-sentences     ::= "[" square-bracket-part "]" ( "<" angled-bracket-part ">" | "{" flml-sentences "}" ) "(" round-parenthese-part ")"

6.1.2. Terminologies and Concepts#

File

A binary or plaintext file.

Block

A consistent data of a file.

Element length

Element type

Element label

statment

A statament in FLML is a expression end by “;”. If the statament is last one of a sentence part. the “;” can be omiited.

sentence

A FLML sentence looks like [statment]<statment>(statment) or [statment]{sentences}(statament). A sentence is have tree sentence parts, the first one is called “square bracket part”, which include the “[]” marker and statments it containing. The second is called “angled bracket part” or “curly bracket part”. The last is called “round parenthese part”.

block

A block is the uint which construct the further data structure. For instance, [8] <int> () (example A), where the “int” is the block, which is inclose by a “<>” parenthese. The main function of “angled bracket part” and “curly bracket part” is to contain block.

sample block and complex block

The block can de divided into two tipies: sample block and complex block. A sample block is a basic data type which have beed define in this language, which can not consist of other blocks. For example, the “int”, “float”, “char” all are sample blocks. The sample block was enclosed by “<>”. The complex block, on the other hand, is made up of sample blocks. For example, [3]{[1]<int>() [1]<float>()}() (example B). The complex block in the example is consist of one int and one float. The complex block is enclosed by “{}”

block type

There many kinds of sample block type, each type reprent the its data type as well as data size. For example, A “uint64” sample block meant that the data is a integer and it consums 64 bits.

block size

For a given block, no matter it is a sample block or complex block, the size of it is decided. that is the size of block, or in term, block size. For the example I given above, the block size of “{[1]<int> [1]<float>}” is 8 bytes (here we suppose the size of int is 4 bytes).

block multiplier

There is a number or variable in “[]” to indicate the amount of block. For example A which given above, “[8]” mean there are 8 “<int>”. The number “8” here is a block multiplier, which use to represent the repeated time of the block.

segment, segment length, elements of segment

The block multiplied by multiplier of same sentence makes a segment. For example A, [8]<int>() make a segment, which have 8 int, the the size is 32 bytes. The block makes a sagments also called the elements of segment. The multiplier also termed the length of segment or segment length.

6.1.3. Variables and Data Types#

There are two data types in FLML, one is scaler and the other is array. scaler can refer to number, file, and iterater, and a order element. On the other hand, array is a collection of scaler.

A scaler variable, which is used to reprent a scaler, is start with “$”, and a array variable is start with “@”.

Here are some examples.

$sca = 3;
@arr = [1, 2, 3, 4, 5];
$sca = @arr[0]; // $sca equal 1
@arr[:2] = [7, 8];

Array can be indexed and sliced, “@arr[0]” refers to the first element of the array, while “@arr[:3]” refers to a range form the first to the third.

In BNF:

variable ::= "$" [a-zA-Z*+?] + [0-9]* | "@" [a-zA-Z]+ [0-9]*

6.1.4. Operators and Expressions#

The operator of FLML include + - * / : ~ ^ = The “+ - * /” is same as normally itself in algebra. For example:

$foo = 1 + 3; // $foo equal 4
$foo = 4 - 3; // $foo equal 1
$foo = 4 * 3; // $foo equal 12
$foo = 5 / 2; // $foo equal 2.5

There are five operator in FLML, they are “+ : ~ ^ =”. They have sepecial meaning in certain context.

A expression of FLML is consist of variables and operators. and a expression end with a “;” make a statament.

In BNF:

statament  ::= expression ";"
expression ::= (operator)? (variable | number) (operator expression)?
variable   ::= "$" [a-zA-Z*+?] + [0-9]* | "@" [a-zA-Z]+ [0-9]*
number     ::= [1-9]+ "." [1-9]
operator   ::= [+-*/:~^]

6.1.5. FLML sentences#

Square bracket part#

square-bracket-part is the first part of FLML sentence, which mainlly used to describe the number of block. This part is made of statment enclosed by “[]”. The part have four types of stetments.

  1. A statament indicate the number of block

This statament is a expression, the value of the expression is number of block, In Terminology, this value is the multiplier of block or length of the segment.

For example:

[3] <byte> ()
[%let $num = 5] <> ()
[$num * 2] <float> ()

For the first sentence in the example above, the block is “byte”, and multiplier is 3. which make a segment of 3 bytes. The second sentence defined a variable, whose value is 5. And in the third sentence, the statament in square bracket part is a expression having a value 10, The the multiplier is 10, the segment is 10 floats sagment.

  1. Iteration operator and iteration statament.

Along with multiplier, there can be a iteration statament. which made of “~” followed by variable.

For example:

[3; ~$i] {
    [$i] <float> ()
    [2] <int> ()

} ()

In the example, The “~$i” is a iteration statament, The $i will iterated from 0 to 3 in its element. The block of sentence is complex block, the complex is descirbed by two sentence, The segment have 3 block, the first block is made of 0 float 2 integers, and second is made of 1 float 2 integers. The third is made of 2 floats 2 integers.

  1. Order collecting operation and order collecting statament.

Some time the order of a sequece is importand and the order may be aligned by following segments.

For example:

[10; ^@myorder] <string> ()
[10] <int> (alignwith=@myorder)
[10; ~$i] {
    [1] <float> (order=@myorder[$i])
}
  1. statament of FLML operation

This kind of statament is operation of FLML, such as declear a variable, branch and loop and so on.

For example:

[%let $var = 3]
[%if $var == 2] {
    [1] <int>
}

Note

multi FLML statement can be writren within one square bracket.

In modified BNF, it can be descirbed as:

square-bracket-part ::= (expression (";" "~"variable)? (";") "^"variable ) | other statament

Angled bracket part#

angled-bracket-part is mainlly used to offered block information. It also have some additional stataments.

  1. a string represent block tpye.

For example::

[1] <float> // block type is float [1] <uint32> // block type is int, whose size is 4 bytes

  1. A statament only have a variable.

For example:

[1] <int; $int_value>  // value of this block is stored in $int_value
[3] <float; @float_values> //this segment have 3 float, the values of those floats were stored in @float_values

If the length of segment is one, the data type of variable should be scaler, otherwise, it should be a array.

There are a typea operator can be applied to this variable: accumulating operator “+”.

“+” will keep the value already stored by the variable, and add the new value up to the original.

For example:

[10] {
    [1] <int; +$sum>

}

This will add 10 value to $sum.

  1. Assign a value to the block

We can assign one or more value to a segment.

For example:

[1] <int; =2>
[4] <int; =[1,2,3,4]>
[%let $a = 5]
[%let @b = [1, 2, 3]]
[1] <int; =$a>
[3] <int; =@b>
  1. A choices of block.

For example:

[8] <char; =0> (dsp="this segment has 8 blocks, and the value of block is 0")
[4] <int; ={0, 1}> (dsp="this segment have 4 int, the value of block should be either 0 or 1")

In modified BNF:

angle-bracket-part ::= block-type (";" variable)? | (";" "+"variable) (";" ("=" | "=:") variable)? (";" "=" choices | range | value_list)?
choices            ::= "{" elements "}"
range              ::= "(" ("(" | "[") range-start ","  range-end ("]" | ")" ) ")"
value_list         ::= "[" elements "]"
elements           ::= variable ("," variable)*

curly-bracket-part#

When the block is not a sample block type, such as int, float and so on, instead it is some other segment. the curly bracket is used to contain those segment. The other applicaiton of curly-bracket-part is used for complex statments like [%if 1]{}().

  1. used when block is a segment.

For example:

[6] {
    [2] <bit> ()
    [3] <int> ()
} (dsp="the block is sagment, the sagment is 2 bits and 3 int")
  1. used when a complex statment introduced.

For example:

[%for $i = 0; $i < 10; $i++] {
    [$i + 1] <int> ()
} (dsp="$i changed from 0 to 9")

By the way, this example can be replace by other way:

[10; ~$i] {
    [$i + 1] <int> ()
} ()

round-parenthesis-part#

round-parenthesis-part contain labels that used to descirbe the segment or block.

For example:

[1] <char; =2> (dsp="this is a example"; value="1 for fou, 2 for bar"; name="example-segment")

The lable is pre-defined by FLML, the user can define label themself by [%deflabel mylabe "this is my label"]<>() too.

In modified BNF:

description     ::= label-name "=" '"' value '"' (";" label-name "=" '"' value '"') *
label-name      ::= [a-ZA-Z] +
value           ::= [a-zA-z\s] +

6.1.6. Declearation of new variable#

“%let” can be used to declear a new variable. For example:

[%let $a = 3]

The new declear variable can initiated like what we do in example.

A variable can auto declear when it show up first time. For example::

[1] <int; $bar>

The variable “$bar” is decleared and the value of the block is assigned to it.

6.1.7. Branch#

The Branch in FLML used key words %if %ifel %else.

The usage is:

[%if expression] {
    sentences
} ()

[%elif expression] {
    sentences
} ()

[%else] {
    sentences
} ()

6.1.8. Loop#

  1. The “for” loop

The usage of for statment is:

[%for expression_a; expression_b; expression_c] {
    sentences
} ()

The for loop is just like C’s.

For example:

[%let $sum = 0] <> ()
[%for $i = 0; $i < 10; $i ++] {
    [$sum += $i] <> ()
} ()
  1. The “while” loop

The usage of while loop:

[%while expression] {
    statments
} ()

6.1.9. Function#

The way to define a function:

[%deffunc $funname (arguments) returns] {
    sentences
} ()

Here is an example:

[%deffunc $myadd ($a, $b) $c] {

    [$c = $a + $b] <> ()
    [%return $c] <> ()

} ()

The [%return] can be omitted.

6.1.10. Comment#

  1. comment like C language.

The comment in C style is acceptable.

Here is example:

[1] <int> () //here is a comment

//[3] <int> ()

/*
    [3] {
        [5] {
            [5] <float> ()
        } ()
    } ()
/*
  1. segment comment.

“#” can be used for segment comment, to comment a segment.

For example:

[# 10] {
    [1] <int> ()
    [1] <float> ()
} ()

6.1.11. Omission#

A FLML must have a square bracket part. The angle bracket part and round parenthesis part can be omiited if they have no contents.

Examples:

[%let $sum = 0]
[%for $i = 0; $i < 10; $i++] {
    [$sum += $i]
}

6.1.12. “ “ and ‘ ‘ in FLML#

“” and ‘’ can be used to parenthesis a string. The difference between them is that the variable within “” would be extended, the other is not. The specifier like “n”, “t” would refer to a new line and tab respectively too.

For example:

[%let $var = 3; %let @arr = [1, 2, 3]]
[%mesg "\$var is $var"] //the mesg is: $var is 3
[%mesg 'this is @aarr'] // the message is: this is @arr

6.1.13. Appendix#

Key words#

All key words of FLML begain with “%”.

  • %let

    Declear a variable and initiate it.

    [%let $var = 12]
    [%let @arr = [1, 2, 3]]
    
  • %if %elif %else

    Those three key words is used in loop.

    ..code

    [1] <int; $var>
    [%if $var > 10] {
        [10] <int>
    }
    [%elif $var == 10] {
        [5] <int>
    }
    
    [%else] {
        [1] <int>
    }
    
  • %for

    To construct for loop sentence.

    [%let $var = 10]
    [%for ($i = 0;$i < 10; $i += 1)] {
        [$var]
    }
    

    If no other stataments, the parenthesis of “%for” can be omiited.

  • %while

    To make whild loop sentence.

    [$let $var = 10; %let $summ = 0]
    [%while $var > 0] {
        [1] <int; +$summ>
        [$var -= 1]
    }
    
  • %break %continue

    Those key words used in loop.

  • %assert

    Assert a statament.

    [%assert $var == 0]
    
  • %error

    Give error information.

    [%error "this is a error"]
    
  • %mesg

    Give a message.

    [%mesg “this a message”]

  • %deffunc %return

    When use “%deffunc” to define a function, all “[]” can be omitted. The arguments of function put into a parenthesis and saperated by commer. Then the variable will be return followed the arguments. The “%return” statament can be omiited. function should be defined before refered to. You can declear the function first and then define it later like C language.

    [%deffunc %myfunc ($var_a, $var_b) $data_out]
    
    [%let $a = 13; %let $b = 14; %let $c = $myfunc($a, $b)]
    [$mesg "the value of \$c is $c"]
    [$c]<float>
    
    [%deffunc %myfunc ($var_a, $var_b) $data_out] {
    
        %let $c = $var_a + $var_b;
        $data_out = $c;
        %return %data_out; // can be omitted
    }
    
  • %info

    Give information, Generally, use it to offer information about whole file.

    [%info](dsp="a binary file"; filetype="binary"; endianness="little")
    
  • %file

    declear a variable which refer to a file.

    [%file $file_var "file description" "file_name"]
    

    The “file name” can be omiited.

  • %parse

    To parse an array.

    [100]<byte; @data_a>
    [%let @data_b = %transform(@data_b)]
    
    [%parse @data_b] {
    
        sentences
    
    }
    

    The original data in the file maybe need some transform and the transformed data have acctual meaning. When is the time “%parse” works.

  • %deflabel

    Used to define a new label user itself.

    [%deflabel newlabel "this is a new label used to express new attribute"]
    

Block type#

  • integer

    The block type of integer include:

    <int8> <uint8> <char>
    <int16> <uint16> <short>
    <int32> <uint32> <int>
    <int64> <uint64> <long>
    
  • float

    <float> <float32> <float64> <double>
    
  • bytes

    <byte>
    
  • bit

    <bit>
    
  • Plaintext.

    <char> <string> <ascii>
    

    the <ascii> was used to reprent asscii code, the block/unit consums 1 byte.

Built in functions#

  • $abs

    %let $a = -2;
    %let $b = $abs($a); // $b equal 2
    
  • $floor

    %let $a = $floor(10 / 3); // $a equal 3
    
  • $ceil

    %let $a = $ceil(10 / 3); // $a equal 4
    
  • $mod

    %let $a = $mod(10, 3); // $a equal 1
    
  • $sum

    %let @arr = [1, 2, 3];
    %let $ss = $sum(@arr); // $ss equal 6
    
  • $append

    %let @arr = [1, 2, 3];
    %let $a = 4;
    $append(@arr, $a); // @arr is [1, 2, 3, 4]
    
  • $pop

    %let @arr = [1, 2, 3];
    %let $a = $pop(@arr); // @arr is [1, 2], $a equal 3
    
  • $length

    %let @arr = [1, 2, 3];
    %let $a = $length(@arr); // $a is 3
    
  • $getorder

    Get the order of a file or array.

    %file $test_file "a test file"
    %let @order = $getorder($test_file); // @order represent the order of file.
    
  • $filelinenum

    Return the line number of a plaintext file.

  • $filesize

    Return size of file.

Standard lables#

  • dsp

    Description of segment. This label is used for general popurse and have no limitation. The value is a string.

    dsp="string"
    
  • ele-dsp

    Description the element of segment. The value is string.

    dsp="string"
    
  • value-dsp value

    Description the mean of each value.

    [1] <char; ={0, 1, 2}> (value-dsp="descripiton of value"; value={0: "dsp one", 1: "dsp two", 2: "des three"})
    
  • NA

    Value to indicate NA.

  • name id

    name of segment.

  • filetype

    File type, vlaue is “binary” or “plaintext”.

  • endianness

    endianness of file, value is “little” or “big”.

  • alignwith order

    The order which the block refer to.

    [%file $myfile "my file"]
    [%let $filelen = $filelinenum($myfile)]
    [@let @order = $getorder($myfile)]
    [1] <int> (order=@order[0])
    [$filelen] <float> (alignwith=@order)
    
  • value-alignwith

  • relatedto

  • datatype

    Used in plaintext descripiton, reprent the data type of block.

  • sep

    Used in plaintext descripiton, the seperator between elements of segment.

  • end

    Used in plaintext descripiton, reprent the end of segment.

  • encode

    Used in plaintext descripiton, reprent the encoding type of plaintext.

  • re

    Used in plaintext descripiton, indicate whether the regular express is used or not.

  • role

  • bitorder

  • BitBlockOri

Specicial variable#

  • $*

    This variable refer to a range [0, infinity).

  • $+

    This variable refer to a range [1, infinity).

  • $?

    This variable refer to a value, which is 0 or 1.

  • $NA $NONE $UNKNOW

    The variable means that the value is not known.

  • $WHITESPACE

    Refer to “s” or “t”.

  • $EOF

    Refer to End Of File.

  • $NEWLINE

    Refer to “n”.

  • $TAB

    Refer to “t”.

  • @CMDARGS

    Refer to a array, which store arguments of command line. This is defined for future usage.

  • $INF $INF_POS $INF_NEG

    Refer to a infinity value.

  • $TRUE

    Refer to true.

  • $FAUSE

    Refer to false.