Overview
Let’s now turn our attention to a fundamental concept: how locations within the file are addressed and how these relate to memory addresses once the file is loaded. We’ll explore the distinction between Relative Virtual Addresses (RVAs) and Virtual Addresses (VAs).
Understanding this difference is absolutely critical for developing a reflective loader. Why? Because a reflective loader must manually replicate the process of mapping the PE file from its raw, disk-based format into a usable, executable image in the process’s virtual memory. This involves interpreting the addresses stored within the PE headers (which are predominantly RVAs) and correctly calculating the final memory locations (VAs) where code and data should reside. Without understanding RVA-to-VA translation, accurately reconstructing the module in memory – the core task of any loader, including a reflective one – is impossible.
Understanding PE Addressing
RVA
As we examined the various PE headers (like the Optional Header and Section Headers), we noted numerous fields containing
address information, such as AddressOfEntryPoint
, the VirtualAddress
in IMAGE_SECTION_HEADER
entries, and the VirtualAddress
fields within the DataDirectory
array. A crucial characteristic of the PE format is that most of these addresses are not
absolute memory locations. Instead, they are specified as Relative Virtual Addresses (RVAs), the keyword here being relative.
An RVA is an offset, measured in bytes, relative to the starting memory address where the PE file is loaded.
This starting address is known as the ImageBase. Think of the loaded module as a contiguous block of memory;
the RVA tells you how far into that block a particular piece of data or code is located, starting from the very first byte
(the ImageBase
).
The primary reason for using RVAs is to achieve relocatability. PE files (especially DLLs, but also EXEs) are typically compiled with a preferred ImageBase
address specified in the Optional Header. This is the ideal memory location where the module would like to be loaded. However, when the Windows loader (or our manual reflective loader) attempts to load the module, this preferred address range might already be occupied by the main executable, another DLL, or a different memory allocation. In such cases, the loader must place, or relocate, the module at a different base address in the process’s virtual address space.
If the addresses stored within the PE file were absolute, they would all point to incorrect locations if the
module were loaded at any address other than its preferred ImageBase
. By using RVAs,
the internal references within the PE file remain valid regardless of where the module is actually loaded in memory.
The RVA represents a constant offset from whatever base address is ultimately chosen.
VA
In contrast to an RVA, a Virtual Address (VA), often referred to as an absolute address, represents the final, actual memory address within the process’s virtual address space where a specific piece of code or data resides after the module has been loaded.
The relationship between these addresses is straightforward and essential to grasp:
VA = ActualImageBase + RVA
Where:
VA
is the final Virtual Address in memory.ActualImageBase
is the actual base memory address where the loader mapped the beginning (the first byte) of the PE file. This might be the preferredImageBase
, or it might be a different address assigned by the loader due to relocation.RVA
is the Relative Virtual Address as stored within the PE file structure.
Consider an example: A DLL has a preferred ImageBase
of 0x10000000
and its .text
section header specifies a VirtualAddress
(which is an RVA) of 0x1000
.
- If the loader successfully maps the DLL at its preferred
ImageBase
of0x10000000
, the.text
section will begin at the VA0x10000000 + 0x1000 = 0x10001000
. - However, if that address space is occupied and the loader must relocate the DLL, perhaps mapping it at
0x11500000
instead, the.text
section will then begin at the VA0x11500000 + 0x1000 = 0x11501000
.
Notice that the RVA (0x1000
) stored within the PE file’s section header remains constant; only the ActualImageBase
changes, resulting in a different final VA for the section’s start. (The potential need to adjust internal code references due to such base address changes is handled by base relocations, a topic we will cover later).
Mapping File Content to Memory
The IMAGE_SECTION_HEADER
structures provide the crucial link needed to map the raw data from the PE file on disk into the correct locations in virtual memory. Each section header contains the necessary information:
PointerToRawData
: This field specifies the file offset – the starting position (byte index from the beginning of the file) of the section’s content within the PE file on disk.SizeOfRawData
: This indicates the size, in bytes, of this section’s data as it exists in the file.VirtualAddress
: As discussed, this is the RVA specifying where the beginning of this section should be placed in memory, relative to theActualImageBase
.
Therefore, for each section defined in the PE file’s section table, the loader performs the following conceptual steps:
- It reads the
PointerToRawData
,SizeOfRawData
, andVirtualAddress
(RVA) values from the section’sIMAGE_SECTION_HEADER
. - It calculates the target destination VA in memory using the formula:
DestinationVA = ActualImageBase + VirtualAddress
(RVA). - It reads
SizeOfRawData
bytes directly from the PE file on disk, starting at the file offset specified byPointerToRawData
. - It writes these bytes into the process’s allocated virtual memory, starting at the calculated
DestinationVA
.
This process is repeated for every section, effectively copying the relevant parts of the PE file from disk and arranging them correctly in memory according to the layout defined by the headers.
Conclusion
With this foundational understanding of PE structure and addressing mechanisms, let’s now do a lab in which we can explore these values directly using an application called PEBear (Lab 2.1), after which we’ll create our very own PE Header parser (Lab 2.2), which is going to be a core part of the logic of our final reflective loader.