What Is Garbage Collector and how does it work?

What is Garbage Collector , How does it work?

When should we care about him?

Author: Caique Romero, 2017-11-16

2 answers

I will answer in general and use the GC of the CLR as the basis. The second question has already been answered.

Memory management is something very difficult.

There is a definition that only 3 things have revolutionized software development: high-level language, modularization, and automatic memory management.

While allocating in the stack everything is easy and management can be done automatically by the language. But there are numerous situations that the object's lifetime requires it to stay in the heap . There the application is responsible for releasing the allocation.

It is common for the programmer to forget or miss when to release this memory, including because there are situations that it is even difficult to control in code when the release is possible. So there may be memory leak or release something that is still in use.

There are several techniques that can automate memory release. Strictly all of them can be garbage collection calls since it frees up the memory of something that no longer needs to be used, so it's that turned garbage.

Some people consider that not everything that is automatically released is a collection. Some consider only the collection that is made subsequently to the object no longer necessary. It is from this collection that we will talk. I will not talk about the collections made by smart pointers that fully or partially manage the life time uniquely or by counting reference. They are useful, serve a lot, have advantages, but it does not come to the case discuss here. There are situations that they are not suitable.

The garbage colletor that everyone talks about is a complex memory management mechanism that is responsible for allocating and releasing memory. It is he who decides where to put the objects in the heap and he also decides when to release the memory and in what way.

The mechanism used is state check memory to identify what is in use and what is no longer. In this way, there is no risk of something falling behind.

This is a huge simplification for the programmer because he can make any mess he wants and does not have to worry about memory (or almost).

This is what we call managed memory, the basis of the .NET philosophy .you can not corrupt memory and have no leaks (without taking certain care).

Of course this has a cost, it has disadvantage.

This model has a certain memory waste. It only frees memory from time to time, usually it does not return all or no memory to the operating system.

It runs non-deterministically, you don't know when it will run. It has ways to have some control in some mechanisms, but it is usually poorly used and does not have all control.

Because of this, release of resources outside the application may occur late if you do not have another mechanism to control this.

Almost always it generates pauses, some can be long. Of course, it depends on the quality of the mechanism.

Has other more specific details that make it undesirable in some situations. I'm getting better.

But in multiple threads environment, where there are exceptions, objects that circulate through the application with no clearly defined lifetime, where there are abstractions that hide certain allocation effects, where there are data with circular reference, it is very difficult to function without such management.

The way it allocates places nearby objects always reducing memory fragmentation which is a huge problem in manual or automated management directly. With nearby objects the reference location is guaranteed and the cache is used more efficiently giving more performance in most cases. Trying to do something similar on hand takes more work than writing a GC. So it is possible to have these gains without a GC, but it is almost always unfeasible.

You work with memory as if it were infinite.

The collection takes place by looking at what we call roots ( roots ), then it starts by looking at the processor registers, the static area of the code and the application stack (they can be stacks if you have other threads). From there it builds a graph of referenced objects. In each object it finds can have other reference to other objects, and so it goes recursively entering the heap . This is called the phase of mark .

Then comes the memory release phase. It has several techniques to do this, among them the sweep, which even releases all objects that have not been marked with assets, has the copy that copies to another area what is still active and kills everything that existed and has the compact, where it makes the copy in a specialized way. This is what .NET uses. Even more so he uses a generational compactor.

With memory generations the pauses can be reduced, the allocation can become very efficient, much more than done with manual or automated memory management, and allows you to have different collection strategies for each generation, producing the best result.

A perfect garbage collector could be more efficient than simple manual or automated management in almost all cases. Of course you can adopt manual strategies to be very efficient, but in practice would create mechanism even more complex than the so-called tracers collectors (tracing garbage colllector).

.NET solves the problem of non-determinism using a Availability pattern ( disposing ). This allows the resource to be released before the GC runs whenever the object is no longer needed. Of course if this does not occur the release will be made by the GC, which will pause be longer and will leave the feature leaked for ais time than it should. A file can stay open, for example.

.NET solves the problem of copying overhead by having a separate area for large objects, so only smaller ones are copied.

But generations cause another problem. If the GC is called too much, it tends to throw to the following generations objects that have short life time, which is far from ideal since each generation tends to have more pause and tends to have more overhead . So you shouldn't call it manually.

Obviously the specific way GC works is implementation detail. The GC gives some guarantees, other than that the programmer cannot avail himself of these implementation details.

One of the cool things about GC is that the memory allocation comes with performance equal to or similar to what you need in the stack. In the stack the lease only increments a pointer that usually stay on the register, it's very fast. In Generation 0 it can be done the same way. It allocates in sequence only by incrementing a pointer.

In contrast to the manual allocation (the automated one does not cease to be a manual allocation, it is only more abstracted has a cost that is not trivial to allocate). He needs to find where to allocate. It has efficient algorithms paying another price or memory waste, which can be worse than GC, or memory release cost too high. Have optimizations that can be done, but it takes a lot of work. It gets much worse if the memory always has to ask the operating system for the memory it will allocate. It also gets a lot worse if if you have multiple threads because the lease needs to need to be locked, it gives a overhead of the dog.

The .NET GC has an area for each existing processor, so an allocation never occurs concurrently and can be naturally atomic without locks ( locks). It is very efficient.

Generally, each of these Gen0 areas starts with 256KB. But this can be adapted as the execution goes on identifying that it can be more efficient with another size, reducing the pause time or decreasing the number of pauses according to the generated garbage pattern.

When this area fills, it triggers a collection. Then the marking phase is done and copies everything that has survived in this area to Generation 1. Java has a strategy of copying to an area help in Gen0 before giving a little more time in this generation. This is important because Java produces much more garbage.

If all goes well little is copied. It is very common for objects to have very short lives.

When there is the copy, all references to it need to be updated to the new address where the object is.

This is a beautiful one overhead and it damages the cache since it has to be accessing data that is effectively not being used by application.

In gen1...

For the GC to work well it is necessary to help the compiler, it is necessary to have structures with additional information. If you didn't just use what's called a conservative GC, where you only release memory if you're sure that that's a reference to an object and it doesn't always know, then a lot of memory leaks. In practice it can not be used.

Other languages

.NET is often better than current Java because it encourages use do stack more than the heap, and it's doing this more and more. It seems that there is the intention of the JITter or even the compiler to optimize alone and put in the stack some some things in the stack when it identifies that the object is small and has guaranteed life time only in the stack. Java'S GC and jitter is smarter since it often overuses the heap.

One thing that is said about C++ is that it does not need a garbage collector because it generates very little garbage. This is not entirely true because it is less, but it is not so little. It does work not generate so much garbage in many cases, there is a GC, it just is not tracer, and it is often not efficient.

Conclusion

Make no mistake, I'm not saying GC is better than other forms, but it's not as bad as they say. Has cases that he can be better, in which he is worse, the difference is not so absurd and almost always does not difference.

To learn more have the our dear tag. Especially the GC for the C # .

Obviously we fit more specific questions about the various types, techniques, and specifics about garbage collecting.

Much of what is still missing can be read in other answers such as how to identify and prevent memory leak in .NET?.

This will be one of those long answers, but I'm going to bit. The links will come later. Calm down I'll still arrange the text.

 12
Author: Maniero, 2020-11-27 12:40:02

Author's Note
The content below is mostly composed of excerpts from a Article originally published by Macoratti on its website. Reproduction authorized by the author.

The .NETframework garbage Collector (manages the allocation and release of memory for your application . Each time you create a new object, the Common Language Runtime allocates memory to the object from the heap managed. While address spaces are available in the managed heap, the runtime continues to allocate space for new objects.

However, memory is not infinite. Eventually, the garbage collector must perform a collection in order to free up memory. The garbage collector optimization engine determines the best time to perform the collection, based on the allocations made.

When the garbage collector it performs a collection, checks for objects in the managed heap that are no longer being used by the application, and performs the necessary operations to retrieve its memory.

Thus, garbage collection is a process that automatically frees the memory of objects that are no longer in use. The decision to resort to the destruction process is made by a special program known as garbage Collector (Garbage Collector). However, when an object loses the scope at the end of the Main() method, the destruction process is not necessarily invoked.

Thus, you cannot determine when the destructor method will be called. The garbage collector also identifies objects that are no longer referenced in the program and frees up the memory allocated to them. You cannot destroy an object explicitly in code. In fact, this is the prerogative of the garbage collector, which destroys the objects for programmers. The process of garbage collection it happens automatically. It ensures that:

  • objects are destroyed: it does not specify when the object will be destroyed.
  • only unused objects are destroyed: an object is never destroyed if it keeps the reference of another object.

C # provides special methods that are usedto release the instance of a class from memory, these are : Finalize() and Dispose().

Finish()

The destructor method Finalize() is a special method that is called from a class to which it belongs or from derived classes. It is called after the last reference of an object is released from memory.

Dispose()

The Dispose() method is called to release resources, such as connecting to a database, as soon as the object using the resource is no longer being used. Different from Method Finalize() Method Dispose() is not called automatically and you need to call it explicitly from a client application when an object is no longer needed. The IDisposable interface contains the method Dispose() and so to call this method the class needs to implement this interface.

Comparison:

insert the description of the image here

Source

 -1
Author: Luiz Santos, 2017-11-22 02:26:21