The post will not show you every detail about the PE format. However, you ought to be given some knowledge helpful to crack an PE when you’re done reading. By the way, the details of PE format can be found here, if you want some precise insights from me, please keep reading.
Before we cut into the theme, let’s prepare ourselves for correctly reading the byte stream. As known to us all, the Intel CPUs all comply to the little endian byte order, whose standard has a great influence on the PC market. So bear in mind that the byte sections I’m gonna show to you are all little endian ordering. That’s for every data unit, as the offset grows, the byte gets less and less significant. Hopefully the illustration above will help you get over some confusion later.
Each PE file has a header, and you can dig out much crucial information from this tiny part. The definitions of the headers serve as references when needed, so there is no need to memorize them all.
First of all, there are DOS header
The structure representing the DOS header is _IMAGE_DOS_HEADER
, which takes up 64 bytes of space. There are dozens of fields in this structure but what we really care about is the last DWORD
(4 bytes).
It’s declaration is DWORD e_lfanew;
. Its job is to point out the offset of PE header.
Note that most content of the PE file’s header is designed for program loader of the operating system to work flawlessly.
Here is its definition:
struct _IMAGE_DOS_HEADER { // DOS .EXE header
WORD e_magic; // Magic number
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
};
It consists of Standard NT Header (AKA FileHeader below) and Optional NT Header, both of which are members of C structure _IMAGE_NT_HEADERS
.
The declaration shows the layout:
struct _IMAGE_NT_HEADERS{
DWORD Signature; //"PE\0\0" if valid
_IMAGE_FILE_HEADER FileHeader;
_IMAGE_OPTIONAL_HEADER OptionalHeader;
};
_IMAGE_FILE_HEADER
This structure occupies 20 bytes.
struct _IMAGE_FILE_HEADER{
WORD Machine; // 0x0: All platform; 0x14C: Intel i386 and later;
WORD NumberOfSections;
DWORD TimeDateStamp; // Set on finishing linking
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics; // Each bit represents a different property
};
The member mostly referred to is the Characteristic
and meaning of its each bit can be found in official page.
_IMAGE_OPTIONAL_HEADER
This structure is placed adjacent to the former one (_IMAGE_FILE_HEADER
) and is much larger — 224 bytes which peaks among the list of PE headers in size. (Since the array DataDirectory
might actually have more or less than 16 elements, the size of Optional Header varies.)
The declaration is the following:
struct _IMAGE_OPTIONAL_HEADER{
WORD Magic; //0x0107:ROM image, 0x010B:32bit PE, 0X020B:64bit PE
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint; // A relative virtual address of Program Entry Point
DWORD BaseOfCode; // Base address of code
DWORD BaseOfData; // Base address of data
DWORD ImageBase; // The base address in memory
DWORD SectionAlignment; // Decides the alignment size
DWORD FileAlignment; // Mostly 1000H
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue; // Always 0
DWORD SizeOfImage; // Size of memory occupied after loading into memory
DWORD SizeOfHeaders; // Total size of the DOS header plus PE Header.
DWORD CheckSum;
WORD Subsystem;
DWORD DllCharacteristics; // Always 0
DWORD SizeOfStackReserve; // Default reserved stack size
DWORD SizeOfStackCommit; // Commited stack size
DWORD SizeOfHeapReserve; // Default reserved heap size
DWORD SizeOfHeapCommit; // Commited heap size
DWORD LoaderFlags; // Always 0
DWORD NumberOfRvaAndSizes; // Number of members in the next array
_IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];// #define IMAGE_NUMBEROF_DIRECTORY_ENTRIES 16
};
All members with comments are instrumental. Note that all size related stuff referred above obeys the memory alignment mechanism.
For each 32bit program there are virtual memory space from 0x0
to 0xFFFFFFFF
. And the ImageBase
specifies the default offset the PE should be loaded to.
FileBuffer is the way how PE lives in the hard drive. However, execution of a PE requires an ImageBuffer in memory storing all the machine code and other data. The ImageBuffer is a copy of stretched version FileBuffer and the stretching process is done by the PE Loader of the OS.
The ImageBase
address is not RVA. After the FileBuffer is loaded, the PE Loader sets EIP
register to ImageBase
+ AddressOfEntryPoint
, which is to make the program start from its entry point.
Note that the value of Alignment specifies the minimal storage unit of the subject. For instance, SectionAlignment
specifies the unit for sections in ImageBuffer, the FileAlignment
in FileBuffer. These two values might not be the same sometimes.
For better organization, higher efficiency and other benifits, PE format has exploited the Section mechanism. The resource such as code, data are grouped according to their attributes and placed together respectively.
One of those benifits the Section mechanism brings is security. We avoid puting code and data together so we get away from accidently overwriting our code or run the data in some extent.
At the beginning of each Section, there is a Section Header containing the locations, access privileges, attributes and so on.
The section table can be found at the offset _IMAGE_DOS_HEADER.e_lfanew
+ IMAGE_SIZEOF_SIGNATURE + IMAGE_SIZEOF_FILE_HEADER
+ _IMAGE_FILE_HEADER.SizeOfOptionalHeader
relative to the beginning of the binary file (For FileBuffer) or ImageBase
(For ImageBuffer). That’s to say, the Section Table is put right next to the Optional NT header.
Both
IMAGE_SIZEOF_SIGNATURE
andIMAGE_SIZEOF_FILE_HEADER
are predefined macros.From the formula above we also manage to know the alignment of all the headers I have mentioned above. Try drawing a graph for them on the paper yourself.
Note that there is no such structure corresponding to Section Table in code, this term just helps me to put thing easier.
Inside of the Section Table are Section Headers placed consequently.
The definition of Section Header looks like this:
#define IMAGE_SIZEOF_SHORT_NAME 8
struct _IMAGE_SECTION_HEADER {
UCHAR Name[IMAGE_SIZEOF_SHORT_NAME];
union {
ULONG PhysicalAddress;
ULONG VirtualSize;
} Misc;
ULONG VirtualAddress;
ULONG SizeOfRawData;
ULONG PointerToRawData;
ULONG PointerToRelocations;
ULONG PointerToLinenumbers;
USHORT NumberOfRelocations;
USHORT NumberOfLinenumbers;
ULONG Characteristics;
}
The five fields : VirtualAddress
, VirtualSize
, PointerToRawData
, SizeOfRawData
and Characteristic
are especially important for us hackers. You should read the next section RVA and RAW, where four of the fields are explained.
This two terms are crucial for you to understand how PE FileBuffer is mapped to the ImageBuffer.
RVA is the bias between specified address and ImageBase. RAW is the bias between file offset and the beginning of the file. The VirtualAddress
field of the _IMAGE_SECTION_HEADER
is also an RVA.
So, look at the definition of _IMAGE_SECTION_HEADER
. The VirtualAddress
field is the RVA of the Section in ImageBuffer. The PointerToRawData
is RAW of the Section in FileBuffer. PointerToRawData
and VirtualSize
are then both self-evident.
Though the section-wise layout always changes after transformation between the FileBuffer and ImageBuffer. The inner layout of each section is not likely to change. So we come to a formula: RAW - PointerToRawData
= RVA - VirtualAddress
.
For example, say we have an ImageBase
of value 0x01000000
, a .text section in ImageBuffer at 0x01001000
(which is to say the VirtualAddress
= 0x00001000
- 0x0100000
= 0x00001000
). The corresponding .text section’s PointerToRawData
is 0x00000400
Now there is a structure at 0x00001124
in this .text section in FileBuffer. How do you find out its location in the ImageBuffer ?
Easy! Apply that equation to this problem : RVA = RAW(0x00001124
) - PointerToRawData
(0x00000400
) + VirtualAddress
(0x00001000
) = 0x00001d24
. And according to the RVA, we get its VA = RVA(0x00001d24
) + ImageBase
(0x01000000
) = 0x01001d24
. Now we’ve derived all location information with the help of that simple equation.
We have played a little bit around part of the PE header. We have learnt about DOS header(_IMAGE_DOS_HEADER
), NT header(_IMAGE_NT_HEADERS
) and Sections(_IMAGE_SECTION_HEADER
). Hopefully two formulas introduced in Section Table and RVA and RAW can help you draw a schematic diagram for part of the FileBuffer and its corresponding ImageBuffer.
In next post, I will talk about the IAT, EAT and DLL.
Made with ❤ and at Earth.