How does Python treat the "yield" command internally?

I was reading about Python's yield command, and it seems to me that this command creates a generator that would be a kind of data list in which the return of the value occurs on demand, as if the last "state" of the interaction was "memorized" somehow.

To prove this, see this function that returns three letters:

def letras():
    yield 'A'
    yield 'B'
    yield 'C'

Calling the function letras() in a for to get the data:

for letra in letras():
    print(letra)

Notice that the output evening:

A
B
C

Now, if I modify the function letras() to increment the value of v which is a global variable:

def letras():
    global v
    v += 1
    print(v)
    yield 'A'
    yield 'B'
    yield 'C'

And the output:

1
A
B
C

See that v has the value 1, this shows that the function letras() did not memorize the state of v, only the values returned in the yield, it is as if the function had been called only once. As a result, I still can't see clearly how yield works, as the behavior of the function it seems to have been different from what I expected and caused me even more confusion about yield, maybe understanding how Python handles it internally can help.

Question

So I would like to know how Python treats the yield command internally? Or what structure or mechanism does yield Use?

Author: gato, 2019-03-02

2 answers

Understanding what happens

The yield internally does really a lot: To begin with: the fact that a function has the keyword yield anywhere in its body causes it to be treated differently by Python. The function ceases to be a function and becomes a "generator function".

This change is only comparable to functions that are explicitly declared as asynchronous with the string async def instead of only def.

what happens is more or less easy to understand: when you call a function that has a yield in its body - even if that yield is never going to be executed - no line of that function is executed right away. What Python does is create a special object of type generator that is returned as if it were the "return" of that function.

A "generator" in turn is an object that will have the special method __next__, (and also send and throw - I speak more below).

When a "generator" is used on the right side of a command for, the language itself will call the generator method __next__ once for each repetition of the for.

The first time the __next__ is called, then yes, the function starts to be executed on the first line - it happens with it what happens with normal functions - it receives the parameters that were passed in the initial call, and it is executed line by line, until it reaches the first yield. At this time the execution is "suspended" - the value of all local variables of the generator is saved, and the value that was sent by yield is returned as a result of the call to the function __next__. On the next call to __next__, processing does not start again in the first line of the function, but rather at the point where {[10 was]} - the processing continues from there, Line by line, until the encounter of a new yield, or a command return (or the end of the function, which is equivalent, in Python, to a return None).

When the generator comes to an end, instead of returning the value that is in the return - it throws an exception of Type StpoIteration. The for command automatically captures this StopIteration and terminates the for block.

It becomes easier to understand if we create a function with yield inside, and use it in "manual" mode, without the for:

In [275]: def exemplo(): 
     ...:     print("primeira parte") 
     ...:     yield 0 
     ...:     print("segunda parte") 
     ...:     yield 1 
     ...:     print("parte final") 
     ...:     return 2 
     ...:                                                                                                                        

In [276]: type(exemplo)                                                                                                          
Out[276]: function

In [278]: gen = exemplo()                                                                                                        

In [279]: gen, type(gen)                                                                                                         
Out[279]: (<generator object exemplo at 0x7f5533e2a6d8>, generator)

In [281]: gen.__next__()                                                                                                         
primeira parte
Out[281]: 0

In [282]: gen.__next__()                                                                                                         
segunda parte
Out[282]: 1

In [283]: gen.__next__()                                                                                                         
parte final
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-283-d5d004b357fe> in <module>
----> 1 gen.__next__()

StopIteration: 2

Notice that nothing was printed after the execution of the entry "278" - the call exemplo() returns a "generator", as we can see from the representation and the type, in the input "279", and the line that prints "first part" is only called when we call the __next__ the first time.

Using the same example function in a for, the output is:

In [284]: for v in exemplo(): 
     ...:     print(v) 
     ...:                                                                                                                        
primeira parte
0
segunda parte
1
parte final

Another information that is cool: special methods with two prefix and suffix __ very rarely have to be called directly - in general these methods are called by the language itself. So instead of we call .__next__ directly in a function, the most common is to use the Python function next and pass the generator as a parameter.

Then, the Python command for, when used with a "generator function" is equivalent to this sequence using while:

In [286]: gen = exemplo()                                                                                                        

In [287]: while True: 
     ...:     try: 
     ...:         v = next(gen) 
     ...:     except StopIteration: 
     ...:         break 
     ...:     print(v) 
     ...:                                                                                                                        
primeira parte
0
segunda parte
1
parte final

(Python's for is smarter than that because it works with other types of objects: in addition to detecting generators, it also works with iterables: objects that have the __iter__ method, and objects that have the methods __len__ and __getitem__ and optionally keys together.)

In your question you add global variables to the example generator function: a global variable is global and its value will be preserved between consecutive calls to the same generator or interspersed with other instances of the generator.

How Python distinguishes a generator from a generator function:

As stated above, it is the Python compiler itself that turns a function into a "generator function". The object type of a function containing a yield remains a function - as can be seen in output "276" above. What Python does is that in the flags of the __code__ object of a generator function, it is marked as such - this makes the behavior of the language when it is called completely different.

That is, it is not "easy" to see that a function is a "generator function" without looking at its code and seeing the yield there-but with the engines from Python introspection, we can see that the name flag "GENERATOR" is set in the .__code__.co_flags attribute of the function. The value of this flag can be seen in the module dis:

In [288]: def exemplo(): 
     ...:     yield 
     ...:                                                                                                                        

In [289]: def contra_exemplo(): 
     ...:     return None 
     ...:                                                                                                                        

In [290]: import dis                                                                                                             

In [291]: dis.COMPILER_FLAG_NAMES                                                                                                
Out[291]: 
{1: 'OPTIMIZED',
 2: 'NEWLOCALS',
 4: 'VARARGS',
 8: 'VARKEYWORDS',
 16: 'NESTED',
 32: 'GENERATOR',
 64: 'NOFREE',
 128: 'COROUTINE',
 256: 'ITERABLE_COROUTINE',
 512: 'ASYNC_GENERATOR'}

In [292]: bool(exemplo.__code__.co_flags & 32)                                                                                   
Out[292]: True

In [293]: bool(contra_exemplo.__code__.co_flags & 32)                                                                            
Out[293]: False

Internal details

Notice that if you create more than one generator from the same "generator function", and use them interspersed, each will have its own local variables - they don't mix:

In [294]: def exemplo3(): 
     ...:     counter = 0 
     ...:     yield counter 
     ...:     counter += 1 
     ...:     yield counter 
     ...:                                                                                                                        

In [295]: gen1 = exemplo3()                                                                                                      

In [296]: gen2 = exemplo3()                                                                                                      

In [297]: next(gen1)                                                                                                             
Out[297]: 0

In [298]: next(gen2)                                                                                                             
Out[298]: 0

In [299]: next(gen2)                                                                                                             
Out[299]: 1

In [300]: next(gen1)                                                                                                             
Out[300]: 1

Where Are these local variables stored then? Whenever we run a block of code in Python, be it a normal function, be it a generator, be it the body of a module or the body of a class, Python creates an object of Type Frame. The language exposes these Frame s as quite normal Python objects - and you can, within them, find the local and global variables of any block of code running. In a program that does not make use of generators or asynchronous functions, a new object type Frame is created each time a function is call - and the latest frame always has a reference to the previous one. This creates a" stack "- which we call the" call stack " in Python. These frame objects are not very small or efficiently created, so we don't use much recursive functions in Python except in didactic code or where it really is the best solution. One of the attributes of a frame is .f_back: a direct reference to the previous frame - and another is f_locals which is a dictionary that mirrors the local variables of the running code (the f_locals in the meantime only works for reading these variables, not for writing their values).

A recursive function with some prints can show the normal use of frames, without generators:

In [306]: import sys                                                                                                             

In [307]: def exemplo4(count): 
     ...:     if count < 4: 
     ...:         print("entrando") 
     ...:         exemplo4(count + 1) 
     ...:         print("saindo") 
     ...:     else: 
     ...:         print(f"count: {count}") 
     ...:         frame = sys._getframe() 
     ...:         frame_count = count 
     ...:         while frame_count: 
     ...:             print(frame, frame.f_locals["count"]) 
     ...:             frame = frame.f_back 
     ...:             frame_count -= 1 
     ...:              
     ...:                                                                                                                        

In [308]: exemplo4(1)                                                                                                            
entrando
entrando
entrando
count: 4
<frame at 0x564291236028, file '<ipython-input-307-165f77ce3bd1>', line 11, code exemplo4> 4
<frame at 0x564291308638, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 3
<frame at 0x56429132ead8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 2
<frame at 0x5642910e4ff8, file '<ipython-input-307-165f77ce3bd1>', line 4, code exemplo4> 1
saindo
saindo
saindo

When a generator is paused with yield, its execution Frame exits that stack - the Frame at the top of the stack returns to the function that called __next__. The generator Frame is then stored in the .gi_frame attribute of the generator itself. Attribute f_locals can inspect the value of variables within it at the time yield was executed:

In [319]: def exemplo5(): 
     ...:     v = 10 
     ...:     yield v 
     ...:     v += 10 
     ...:     yield v 
     ...:                                                                                                                        

In [320]: gen = exemplo5()                                                                                                       

In [321]: gen.gi_frame.f_locals["v"]                                                                                             
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-321-645eee9080b0> in <module>
----> 1 gen.gi_frame.f_locals["v"]

KeyError: 'v'

In [322]: next(gen)                                                                                                              
Out[322]: 10

In [323]: gen.gi_frame.f_locals["v"]                                                                                             
Out[323]: 10

In [324]: next(gen)                                                                                                              
Out[324]: 20

In [325]: gen.gi_frame.f_locals["v"]                                                                                             
Out[325]: 20

Emulating a generator with a class:

Nothing prevents any class in Python from behaving exactly like a generator. In this case, the internal variables must be saved, between iterations, as an instance attribute - while Python saves the local variables by saving the execution Frame.

To do this, just write a class that has the special method __next__ explicitly, and the method __iter__ that will be executed before of the first call to __next__ by for (the class can even be separated into 2 stages - the object returned by __iter__ can be of another class or another instance, and implement only __next__). Note that the values that a generator returns using yield must be returned with a common return of this function.

Then Python code to generate the squares of numbers from 0 to n in Python can be written as a "generator function" like this:

def squares(n):
    for i in range(n):
        yield i ** 2

Or as a class like this:

class Squares:
    def __init__(self, n):
        self.n = n
        self.i = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.i >= self.n:
            raise StopIteration()
        result = self.i ** 2
        self.i += 1
        return result

Using this class in interactive mode:

In [331]: for s in Squares(4): 
 ...:     print(s) 
 ...:                                                                                                                        
0
1
4
9

These classes are not called "generators" - this name is used only for objects created when calling a function containing a yield (such"generator functions"). This type of class is called by the more generic name of iterável - any object that can produce an iterator - an iterator, in turn, is the most generic name for any object that has the __next__ method.

Other generator methods and "advanced information":

In addition to the __next__ method, generators also have the .send and .throw methods - these methods are never called automatically by for. Instead, they can be used when using a generator "manually" to send values to a generator that is already in cause an error of a given type with the .throw - in this case, the throw argument is an object of the exception type - and it is caused at the point where the yield is.

These features are very little used explicitly in "everyday" code, and were added because with them, Python generators can be used as"co-routines". This is different from normal functions that are always "subroutines". Co-routines can be called in parallel collaboratively by a specialized system.

Another Associated expression is yield from - it allows a generator to "yield" another generator, internally, without the value ever being processed by itself - this allows, for example, recursive generators.

I have a " toy "project where I use generator functions as "co-routines", without being asynchronous programming - it simulates a "flat" effect of "Matrix" on the terminal. It only works on a terminal that has ANSI codes enabled, which allows special print sequences to position the cursor and change the color of the letters - which does not yet happen in Windows. The project is here and works well on Linux and Mac: https://github.com/jsbueno/terminal_matrix/blob/master/matrix.py (and to enable Ansi codes in the Windows terminal, see something here: https://stackoverflow.com/questions/16755142/how-to-make-win32-console-recognize-ansi-vt100-escape-sequences)

The combination of the resources provided by the methods .send, .throw and by the expression yield from is what was used to allow the use of asynchronous programming in Python: that is, many functions that are executed in a single thread, but in parallel, passing the execution of the program to another "co-routine" every time a call has to access an external operating system resource that will take time to complete (a network request, read data from a file, a pause of Type time.sleep, etc...).

Asynchronous programming, its syntax and its use are a topic that can give a knot in the head even of advanced programmers-obviously it is not up to describe everything in this answer - but it is worth mentioning that until version 3.4 of Python, when the module asyncio was introduced in the language, the way to do asynchronous programming in Python was with the use of generators and yield from, and the execution, pause, and continuation of co-routines was (and still is), controlled by the asyncio event loop that uses the methods __next__, send and throw. From Python 3.5 a separate syntax for asynchronous functions has been introduced-o async def, await, and others like async for, etc... - but the internal mechanisms that Python uses are the very same ones that are used for the generators.

Summarizing:

A function containing a yield or a yield from is a generator function. When it is called it is not executed immediately - it returns an object of type "generator". Objects of type "generator" have a method __next__ that when called executes the code of the original function until it finds a yield - at this point the function is "paused" - its execution "to where it is" and the execution of the program returns to the point where the method __next__ was called - directly, or implicitly with the command for. When __next__ is called again, the generator is "stripped" - and execution continues at the point of yield. If instead of method .__next__ the generator method .send is called, the value passed as parameter of .send is the value that yield assumes within generator code (otherwise yield is worth None). There is also the .throw method: the parameter for the same must be an exception - Python causes that exception to happen at the point where is the yield

Other questions with more information about how Yield works :

Python asynchronous generators

Python reserved word yield

What's Yield for?

 15
Author: jsbueno, 2019-03-04 04:04:59

In fact when you have a yield in the function it doesn't return that value, it returns a generator. This generator is an object that keeps the state necessary for its control, so it knows where it had stopped and so can continue the next time it is called.

for has its own mechanism that manages this. If calling in hand need to take care of access to the generator in hand. The function next() it is used to manipulate the generator.

Execute is code:

def letras():
    yield 'A'
    yield 'B'
    yield 'C'
gerador = letras()
print(next(gerador))
print(next(gerador))
print(next(gerador))
print(next(letras()))
print(next(letras()))

This variable gerador is that it saves the generator and carries the state of the execution, so every time it invokes the function it will look where the generator is and it will Increment 1 in its internal state. How it does this is implementation detail, but basically saves a list of data and a counter. This list can be a control of the rows to be executed. In standard Python should change little because there is already an internal VM engine that controls the execution stack, so it is just encapsulate this in an object for generator control.

Notice that if you call the function without saving the state it always starts with a new generator.

You don't see in for but a generator is created inside it that goes from start to finish, the for is an abstraction to the project pattern called iterator. And the function is built to create the generating object, because it's an abstraction you don't see the state being created.

Try calling 4 times the next(gerador) there instead of three. The iterator will throw an exception because it has no more data to evaluate.

Objects don't take iterators implicitly, so you have a method that returns the iterator, and since this is very common you have a pattern to take this iterator. Objects that can be iterable have the __iter__() method, and the iterator object must have its own implementation of the __next__() function. Native Python objects already have this.

In the case of a function there is an internal object that is able to be iterable which is created to handle the progress of the function and where it is, but is no different from having the state and the ability to deliver the iterator and who the next item is.

 5
Author: Maniero, 2019-03-02 16:11:03