Process user relocations ======================== Introduction ------------ The ELF file format can be used for two purposes: As executable files (linker output) and as linkable files (object files). The file format is nearly identical for different CPU architectures. However, for different CPU architectures the so-called relocations differ. It is also not possible to define relocation types which are not used by a certain CPU. There are however some use cases where it may be desirable to use relocation types which are not defined by a certain CPU: - There are multi-core microcontrollers using multiple CPU cores with different CPU architectures accessing the same memory. In such cases it makes sense to link object files of different CPUs together. - Some file formats or APIs might require an address to be stored in a non- standard way. In the case of the API the address could be calculated during runtime however this is less efficient than calculating these bytes during link time. - An existing linker tool chain (e.g. x86) may be used for some special CPU (for example a CPU which has been designed by students for education purposes) or even for some file format that does not represent code at all. This document describes a certain data format that allows to add such relocations to ELF object files. There are two different ways to process these relocations: - The linker links the object files to an executable not processing such relocations; a post-processing tool will process these relocations after linking. - A special linker allows linking object files using that certain relocation format. This linker should be able to link object files that officially have different architectures (e.g. link 64-bit SPARC (big endian) and 32-bit x86 (little endian) object files together) Basic file format specification ------------------------------- The object (and linker output) files using this extension have two special sections: ".customreloc" and ".cusrelocinfo". In a linker output there must be exactly one ".cusrelocinfo" section; in an object file there may be multiple of these sections; there may be multiple ".customreloc" sections. For CPUs that do not use one byte per memory locations the endianess and the way of storage shall not be depending on the ELF file format. A memory word 0xXYZ of a 12-bit computer might for example be stored using two bytes as 0x0X 0xYZ or as 0xYZ 0x0X. However, if it is stored as 0x0X 0xYZ in a big-endian ELF file, it shall also be stored as 0x0X 0xYZ (and not as 0xYZ 0x0X) in a little-endian ELF file. This allows linking code stored in different object files together (e.g. code for that 12-bit machine stored in a 64-bit big-endian object file and code for the same machine stored in a 32-bit little-endian object file). Addresses in the ELF file are not necessarily 1:1 mapped to machine addresses. Within one section the "virtual address" stored in the ELF file (which is also the address used for relocations) must increase by 1 per byte: If the n-th byte of a section has the virtual address X then the (n+m)-th byte must have the virtual address (X+m). In the case of the 12-bit machine in the example above the "virtual address" 0x12300010 might actually represent the machine address 0x123410. In this case the "virtual address" 0x12300020 represents the machine address 0x123430. ELF relocation limitations -------------------------- In the case of a post-processing tool the linker is responsible for processing all ELF relocations; there are no limits for using ELF relocations in the object files. In the case of a linker directly being able to process custom relocations there are the following limitations on ELF relocations: - Real data and code sections (PROGBITS sections which really represent memory on the machine) must NOT contain any relocations. Instead ALL "real" relocations must be done using "custom relocations". - ".cusrelocinfo" must also NOT contain any relocations. - Other PROGBITS sections (".customreloc", debugging info and similar) must only contain "absolute 32-bit" relocations (such as R_386_32) in the case of 32-bit ELF object files. This way it is guaranteed that a linker knows the relocation type when it does not understand the value in the "e_machine" field of the ELF file. - In the case of 64-bit ELF files both "absolute 32-bit" relocations and "absolute 64-bit" relocations are allowed. The ".customreloc" section -------------------------- The ".customreloc" section is a PROGBITS section which does not represent memory on the machine (the "sh_flags" field is zero). The section contains information about the "custom relocations" as well as some additional data. The ".customreloc" section contains entries of the following form: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Optional padding) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0xE1A5 |0|L|P|D| Code | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | / Additional data (Length bytes) / / / | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | (Optional padding) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: 0xE1A5 These two bytes represent the value 0xE1A5. The endianess in the "additional data" is calculated from the endianess of this value; a section may contain big and little endian entries the same time! 0 This bit is always zero; because the topmost byte of 0xA5 is one this allows to distinguish between big- and little-endian entries. L This bit is set if a linker directly being able to link such object files must be able to understand this entry or by other tools that process the object file. If the bit is clear the linker may ignore the entry. (*) P This bit is set if a post-processing tool processing linker output files must be able to understand this entry. D This bit is clear in the object file or the linker output file to be processed by the post-processing tool. The post-processing tool will set this bit to indicate that the entry has already been processed. This is useful when the post-processing tool is called twice for some reason. Linkers directly processing object files with the file format described here shall set this bit in ALL entries. Code Type of the entry as described below Length Length of the "additional data" measured in bytes Padding Dummy bytes (should not contain 0xE1 nor 0xA5) making the length of the entry a multiple of 4 bytes. All entries must be a multiple of 4 bytes long. Note that a linker may insert padding bytes between two entries coming from different object files. Padding bytes at the start of the ".customreloc" section may be useful for 64-bit relocation (CODE = 2) entries if 64-bit values need to be aligned. Note: A linker which is able to link ELF files of any architecture (e_machine field of the ELF header) and which exclusively supports user-defined relocations as described in this document will take care about the "L" bit. However a linker which allows only one ELF files of a certain architecture and which processes all relocations "normally" but which is also capable of processing user-defined relocations as described in this document will probably take care about the "P" bit instead. Currently the following codes are defined: CODE = 0: Required version, NOP ------------------------------- This entry means that a certain tool version is required to process the file. If there is no "additional data" (the length field is zero) exactly one of the two fields "L" or "P" must be set. If "L" is set this means that the entire file can only be understood by a post-processing tool; if "P" is set the entire file can only be understood by a linker. If both "P" and "L" bits are zero this entry may be used to pad the start of the next entry or the section length to a certain alignment. CODE = 1: 32-bit relocation --------------------------- Typical flags: L=1, P=1 Length field: variable; a multiple of 4; at least 8 This entry contains information about a custom relocation which is stored in form of 32 bit words. The entry is used in 32-bit ELF files. The data has the following form: - The first word is an address in the ".cusrelocinfo" section. In the case of an object file this word is relocated using a 32-bit absolute relocation whose target is an address in the ".cusrelocinfo" section. In the case of a linker output this is an address inside the ".cusrelocinfo" section (an offset into that section plus the "virtual address" of that section). The entry points to the "description" of the relocation type. - The second word points to the byte or word to be relocated. It is an address in some "regular" PROGBITS section which represents memory on the machine. The variable "a" (see relocation definitions) is initialized with this value. - The following words are additional "arguments" to the relocation described in the ".cusrelocinfo" section. They initialize the variables "b", "c" ... and so on. CODE = 2: 64-bit relocation --------------------------- Typical flags: L=1, P=1 Length field: variable; a multiple of 8; at least 16 This is a variant of "CODE = 1" which is typically used in 64-bit ELF files: The difference is that scripts will work with 64-bit integers instead of 32-bit integers when using this relocation type. The data stored in the "additional data" is made of 64-bit words, not of 32-bit words. The entry is not necessarily supported for 32-bit ELF files. CODE = 3: Machine type ---------------------- Typical flags: L=0, P=0 Length field: variable; typically non-zero This optional entry contains the name of the machine or architecture this file is used for. The name is an ASCII string which is not NUL-terminated. CODE = 4: 32-bit direct linking allowed --------------------------------------- Typical flags: L=0, P=0 Length field: 0 In 32-bit object files the presence of this entry indicates that the object file can be linked using a linker that directly understands this file format. The entry should be ignored by post-processing tools in linker output files. The entry does not occur in 64-bit files. CODE = 5: 64-bit direct linking allowed --------------------------------------- Typical flags: L=1, P=0 Length field: 12 In 64-bit object files the presence of this entry indicates that the object file can be linked using a linker that directly understands this file format. The first 4 bytes are a dummy 32-bit word which is relocated using an absolute 32-bit relocation (such as R_X86_64_32). The relocation target may be any symbol. If the entire object file does not contain 32-bit relocations the first word may be left "unrelocated". The next 8 bytes are a relocated dummy 64-bit word... A linker not understanding the value in the "e_machine" field of the ELF file knows the values for the "64-bit absolute" and "32-bit absolute" relocations in the "ELF64_R_TYPE" field by checking which kind of relocation relocate the two words in this entry... The entry should be ignored by post-processing tools in linker output files. The entry does not occur in 32-bit files. The ".cusrelocinfo" section --------------------------- The ".cusrelocinfo" section is a PROGBITS section which does not represent memory on the machine (the "sh_flags" field is zero). The section does not contain relocations. The section contains NUL-terminated strings which represent relocation instructions that describe how a relocation is processed. These instructions are NUL-terminated strings. They can be empty strings (which means: do nothing) or they can consist of one or more statements of the following form: variable=integerValue; ?booleanValue"Error message"; *integerValue=integerValue; These statements are simply concatenated. Note that with the exception of the error message in the second type of statement there are no non- printable characters in a relocation instruction! The first statement assigns a variable - like in a programming language. Variable which have been assigned a value can be used in the following statements. The second statement will print an error message if the value is false. The error message may contain semi-colons. A quotation mark is printed using two quotation marks. Example: "The section ""xyz"" is full." The third statement performs the actual relocation: The byte at the virtual address given by the left "value" will be overwritten by the low 8 bits of the right value. There are 26 variables named "a"-"z" (lower-case). The first variables are initialized with the words from the data in the ".customreloc" section: "a" is initialized with the second word (relocation position) "b" with the third word (typically the target) "c" with the fourth word (if any) "d" with the fifth word (if any) ... Other variables are uninitialized. For 32-bit relocations (CODE = 1) these variables represent unsigned 32-bit integers, for 64-bit relocations (CODE = 2) they represent unsigned 64-bit integers. Values use many operators known from C/C++ but they strictly distinguish between boolean and integer (32- or 64-bit unsigned, depending on the CODE in the ".customreloc" section) values (just like Java or C#). Unlike C/C++ space characters are not allowed in expressions! Integer constant values are represented by decimal numbers. Unlike C/C++ brackets must be used when using different operators: C/C++: a+b*c+d Relocation: a+(b*c)+d The following operators known from C language take two integer arguments and result in an integer value: +, -, *, /, %, |, &, ^, >>, << The following operators take two integer arguments and result in a boolean value: ==, !=, <=, >=, <, > The following operators take two boolean arguments and result in a boolean value (note that "==" and "!=" can be used for both integers and booleans): ==, !=, &&, || The "a?b:c" operator takes a boolean as first argument; the two remaining arguments must be of the same type. The "*address" operator can be used to read out some virtual address; the result is the value of the byte (0-255). The result is (0-1) (e.g. 0xFFFFFFFF for 32-bit relocations) if there is no such address in the file. Note that the "use brackets" rule also applies to this operator: a+(*(b+c)) This operator is not supported by every tool and it should only be used if it cannot be avoided. Reading out bytes which are relocated (either in the same entry or in other entries) is not allowed and may lead to wrong results. If some byte contains both bits that shall be changed and bits which shall be fixed store the fixed bits in an additional variable in the entry: Wrong: *a=(*a)+(b<<4); Correct: *a=c+(b<<4); Examples -------- The following example shows the "32-bit absolute little-endian" relocation: *a=b;*(a+1)=b>>8;*(a+2)=b>>16;*(a+3)=b>>24; The following example shows the "32-bit PC-relative little-endian" relocation for x86 (address is relative to the address following the memory word): c=b-a-4;*a=c;*(a+1)=c>>8;*(a+2)=c>>16;*(a+3)=c>>24; The following example shows how a distance between two external symbols can be written to a 32-bit word: d=b-c;*a=d;*(a+1)=d>>8;*(a)+2=d>>16;*(a+3)=d>>24; The following example shows a relocation with a restriction: c=b-a-1;?(c>(0-129))||(c<128)"The relocation is too far away!";*a=c;