How do antiviruses analyze my program?

I had a class in college that left me" kind of " intrigued, my teacher was talking about the differences of interpreted languages and compiled languages and pointed out that interpreted languages could have their code stolen, when compiled ones don't. There he opened a series of doubts, where the main one is:

If my code is compiled and you don't know how it was written, How do antiviruses know it can be dangerous?

Author: Raizant, 2017-10-19

2 answers

(...) my teacher was talking about the differences of interpreted languages and compiled languages and pointed out that interpreted languages could have their code stolen, when compiled ones do not (...)

I'm going to give the benefit of the doubt to your teacher, and assume that this statement got here this way due to phone-wireless.

If you have a program on your computer, you have the source code. No exceptions.

O build process generates executables or libraries (e.g. files.dll on Windows), which are files said "in machine language" rather than human readable text. In fact, if you try to open these files, you will see that they are unreadable and do not match the source code files. However, bring this information to life: there is no compiled source code that cannot be decompiled.

Want an example? Use C# to generate an executable or file .dll . Then open the file with the ILSpy.

There are some people who believe that you can make code more "protected" if you use a technique called obfuscation, which "shuffles" decompiled code generated by tools like the one I cited above. But even obfuscation does not protect anyone from "theft", since a really motivated and dedicated programmer can reassemble the original code All the same.

The only way to to ensure that a source code will never be read is not to hand it over to anyone. Leave the code on a server and ensure access to your system over the internet. Only those who have access to the server hard drive will have access to its source code. It's not 100% safe, but it's as close to that as you can get.

Relevant issue: someone commented on this in this answer:

But if any program can be "recompiled" because we don't have windows source code for example?

Look child, we have yes. it's even pretty hilarious. My favorite is that of Windows 2000, which has several pearls written in the comments in the code . Good reading:) (crossed out because this code was leaked, not obtained via reverse engineering, and comments are not included in compiled code).

For example, and again speaking of ILSpy: many things on Windows use .NET, which is currently embedded with the system. You can open ILSpy and use the file -> Open from GAC option to see the source of the main platform libraries.

For other system libraries, you can try a c/c++ decompiler like Snowman. But try to open only small DLLs, otherwise the system crashes (to open large DLLs, you need a plugin). Tip: in Windows 8, you can try to decompile this:

C:\windows\system32\AltTab.dll


About antivirus, they are not even there for your source code - they see the actions that your program performs, regardless of how it was written. Every program interacts with the operating system through requests, requests... I. E.: Windows, tell me what time it is; Linux, send there the byte 00101000 for serial port 2; Solaris, write this in this memory address etc.

Antiviruses look specifically for programs that make Passion Fruit of the type:

  • try read browser program status;
  • impersonate a user to perform operations that require action by a human being (such as pressing the OK buttons on Windows permission requests);
  • force actions for which there is no permission;
  • send data to known malicious Web addresses;

Etc, etc...

This involves pattern identification and currently involves somewhat artificial intelligence.


A finding sad is that from time to time I see someone ask here in SOpt how to do something that will clearly be viewed by antiviruses as an action of a malware. For example: simulate an " ok " via command line . Often people do not think about the consequences that certain actions would bring to people's safety and Privacy if they were possible.

See Also this about how Windows recognizes an application as safe: installer recognized as virus.

 34
Author: Garoto de Programa, 2017-10-26 13:05:29

I decided to answer because it seems I still have doubts about Renan's answer.

Let it be clear that antiviruses do not need and do not care about the source code of the application.

There are mainly two strategies for detecting a virus.

  • one of them is to look for a signature in the executable code itself. It checks if it has a certain sequence of bytes that is previously known to be a virus.
  • Another is to analyze if there is calls to certain APIs or code patterns in a certain way that can be used to cause problems. This is why there are false positives in certain applications.

Surely there are other strategies, it is possible to even check something during execution or it can intercept certain API calls.

What Renan said is that everything an application does is available for consultation. All instructions that a processor will execute and everything that the application will invoke in the application is encoded in a binary. All those bytes have a meaning that can be understood by who knows (the processor for example), it is not something random or encrypted. It's only a little more complicated than a human to understand.

If my code is compiled and you don't know how it was written, How do antiviruses know it can be dangerous?

Gives you to know how it is written (in binary form), does not give you to know the exact source code that originated this binary.

Decompile

If you have a program on your computer you don't have the source code, but you can get something close to the source code that generated that binary. It won't get something the same, it will lack comments, local symbol names, and maybe even modified public symbols, and the exact flow won't be the same, it will just create the same result.

Decompiling is especially possible in languages that use bytecodes e metadata . But when the code is obfuscated it becomes much more difficult to achieve usable results.

But it's not exactly an easy process and is far from generating good results in most cases.

The purpose of the antivirus is not to get the source code, it's just to understand what the binary does.

Source Protection

This idea of interpreted and compiled language is already wrong.

Is it possible to steal code from any application.

This idea of stealing source code is said by naive and laity. Good code is too complex for naive people to understand and for experts to be interested in stealing it. Rough codes would only be of interest to very weak people. Tip, the vast majority of written codes are very rough and do not serve as a reference for anyone. In general, authors of rough codes want to protect themselves.

As a curiosity the Windows code was leaked and not obtained by reverse engineering, so it even has the comments.

 22
Author: Maniero, 2020-06-11 14:45:34