Use python to create an anti-virus software (in progress)

Through careful reverse engineering, you will be able to better understand the benefits that malware binaries provide to the attacker after attacking the target, and the ways in which the attacker can hide and continue to attack the infected computer. As you will see, this article combines descriptions and examples, each section introduces static analysis techniques, and then explains its application in actual analysis.

Strictly speaking, when connecting to the IRC server, this program is designed to reside on the target computer. After ircbot.exe controls the target, the attacker can control the target computer through IRC and execute control instructions, such as turning on the webcam to secretly capture video, extract the target’s geographic location and desktop screenshots, and extract related files from the target machine.

01 Microsoft Windows portable executable file format

To perform static analysis of malware, you need to understand the Windows PE file format, which describes the structure of today’s Windows program files such as .exe, .dll, and .sys, and defines the way they store data. The PE file contains data such as x86 instructions, images, and text, as well as metadata required for the operation of the program.

The original design of the PE format is used for the following operations.

1) Tell Windows how to load the program into memory

The PE format describes which blocks of the file should be loaded into memory and where. It also tells you where in the program code Windows should start executing the program, and which dynamic link code libraries should be loaded into memory.

2) Provide the running program with media (or resources) that may be used during execution

These resources can include strings, such as strings output by GUI dialogs or consoles, and images or videos.

3) Provide secure data, such as digital code signatures

Windows uses this security data to ensure that the code comes from a trusted source.

The PE format

The PE file format includes a series of headers to tell the operating system how to load the program into memory. It also includes a series of sections to contain actual program data. Windows loads these sections into memory so that their offset in memory corresponds to their display position on the disk.

Let’s start from the PE header to explore this file structure in more detail. We will skip the discussion of the DOS header, which is a legacy of the Microsoft DOS operating system in the 1980s and only exists for compatibility reasons.

  1. PE head

As shown at the bottom of , above the DOS header is the PE header, which defines the general attributes of the program, such as binary code, images, compressed data, and other program attributes. It also tells us whether the program is designed for 32-bit or 64-bit systems.

The PE header provides basic but useful contextual information for malware analysts. For example, the header includes a timestamp field, which can give the time when the malware author compiled the file. Usually, the malware author will replace this field with a forged value, but sometimes the malware author forgets to replace it, and this happens.

  1. Optional head

The optional header is actually ubiquitous in today’s PE executable programs, just the opposite of the meaning of its name. It defines the position of the program entry point in the PE file, which refers to the first instruction that runs after the program is loaded.

It also defines the size of data loaded into memory when Windows loads PE files, Windows subsystems, target programs (such as Windows GUI or Windows command line), and other high-level details about the program. Since the entry point of the program tells the reverse engineer where to start the reverse engineering, this header information is very valuable to the reverse engineer.

  1. Section header

The section header describes the data section contained in the PE file. A section in the PE file is a piece of data, which will be mapped into the memory when the operating system loads the program or contains instructions on how to load the program into the memory.

In other words, a section is a sequence of bytes on a disk, which either becomes a string of consecutive bytes in memory, or it informs the operating system about certain aspects of the loading process.

The section header also tells Windows which permissions should be granted to the section, such as whether the program should be readable, writable, or executable during execution. For example, the .text section containing x86 code is usually marked as readable and executable, but not writable, to prevent the program code from accidentally modifying itself during execution.

We describes many sections, such as .text and .rsrc. When PE files are executed, they will be mapped into memory. Other special sections such as the .reloc section will not be mapped into memory, and we will also discuss these sections.

1).text section

Each PE program contains at least one x86 code section marked as executable in its section header; these sections are almost always named .text.

2).idata section

The .idata section, also known as the import section, contains the Import Address Table (IAT), which lists the dynamic link libraries and their functions. IAT is one of the most important PE structures. It is necessary to look at it during the initial analysis of the PE binary file because it points out the libraries that the program calls, but these calls, in turn, may reveal the advanced features of the malware.

3) Data section

The data section in the PE file structure can include sections such as .rsrc, .data, and .rdata, which store mouse cursor images, button icons, audio, and other media used by the program. For example, the .rsrc section contains a printable string that the program uses to render text as a string.

The information in the .rsrc (resource) section is very important to malware analysts because by examining the printable strings, graphic images, and other assets in the PE file, they can get important clues about the function of the file.

In section 03, you will learn how to use the icoutils toolkit (including icotool and wrestool) to extract graphic images from the resource section of the malware binary file. Then, in section 04, you will learn how to extract printable strings from the malware resource section.

4).reloc section

The code of the PE binary file is not position-independent, which means that if it is moved from the expected memory location to the new memory location, it will not execute correctly. .reloc solves this problem by allowing the code to be moved without breaking the code.

If the code of a PE file has been moved, it tells the Windows operating system to perform memory address conversion in the code of the file so that the code can still run correctly. These conversions usually involve adding or subtracting an offset from the memory address.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top