How to extract hexadecimal code from an executable compiled with nasm?

I have an executable, created in assembly language and compiled with NASM.

Is there a way to get the value, in hexadecimal, of the bytes produced by the compiler, so that I can use them in a disassembler (i.e. discover the generated OP codes)?

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    FILE *file;
    char *buffer;
    unsigned long fileLen;
    file = fopen( "teste.o", "rb");
    if (!file) {
        printf("erro\n");
    }
    fseek(file, 0, SEEK_END);
    fileLen=ftell(file);
    fseek(file, 0, SEEK_SET);
    buffer=(char *)malloc(fileLen+1);
    if (!buffer) {
        fprintf(stderr, "Memory error!");
        fclose(file);
        return 0;
    }
    fread(buffer, fileLen, 1, file);
    fclose(file);

    for (unsigned int c=0;c&lt;fileLen;c++) {
        printf("%.2hhx ", buffer[c]);
        if (c % 4 == 3) {
            printf(" ");
        }
        if (c % 16 == 15) {
            printf("\n");
        }
    }
    printf("\n");
    free(buffer);
}
Author: Guilherme Bernal, 2014-05-08

1 answers

The language or compiler you used has little influence on the format of the final executable. If you're on Linux, chances are it's a ELF (Executable and Linkable Format) . Already in Windows, it will be a on (Portable Executable) . Knowing what the format of your executable is (you can also write code that can extract data from the two formats (or from others), just check the magic bytes to differentiate) you need to extract the sections.

How This is saved in the file differs depending on the format, but there is a header with some general information such as architecture, symbol table and Section table. Scroll through the table of sections and check the flags of each. Compilers usually produce some sections that are neither code nor data, such as .comment. By the flags associated with each section you can identify those that contain code (can be more than one).

So you will have a list of sections of code where three pieces of information are important: the size in bytes of the section, the location in virtual memory (this will influence some statements like the CALL if different sections are involved) and the offset in the file. The compiled machine code can be read directly from the executable file by reading size bytes from offset.

If you want to know the name of functions or local variables you will also need the symbol table. This should help since only by the code you can not separate functions clearly. Each symbol is associated with a section and points to a memory address where the function or variable begins.

 1
Author: Guilherme Bernal, 2014-05-09 10:56:43