Python Tutorial: Generators – How to use them and the benefits you receive


hey how’s it going everybody in this Python video we’re going to be going over generators and why you’d want to use them and some of the advantages that they have over lists so in this example I have this function up here called square numbers and what it does is it takes in a list of numbers and then we have this result variable which is set to an empty list and then we loop through all the numbers and from the list that we passed in and we append the square of that number to the result list and then after we’re done looping through all the numbers then we return the result and then you can see here I have this by numbers variable set equal to square numbers I’m passing in a list of 1 2 3 4 5 and then I just print down or then I just print out my numbers down here so if I run this code then you can see that our list of 1 2 3 4 5 that was passed in our result is 1 4 9 16 25 so currently our square numbers function is returning a list now how would we convert this to be a generator well to do this we won’t need this result variable anymore so we can take that out we don’t need the return statement and this result dot append we can take this out and all we have to do is type in this yield keyword and just yield the square number here so this yield keyword is what makes it a generator now if I save this and run it you can see now whenever I print my nums I’m no longer getting the list if you look at the comment here this is what the result used to be I’m no longer getting 1 4 9 16 25 the squares of 1 through 5 no longer getting that result I’m getting this generator object here now the reason for this is because generators don’t hold the entire result in memory it yields one result at a time so really this is waiting for us to ask for the next result so hasn’t actually computed anything yet now if I print it out next my nums which asks for the next result then you can see that it’s one because we passed in our list of 1 2 3 4 5 and then we’re looping through that list and so 1 is the first value so it’s equal to I here and we yield it out 1 times 1 and it gave us that result so now if we copy this line here and print out next my numbs a few times here and run that then you can see that each time that we run next it goes and gets the next value that’s yielded so now we have 1 squared 2 squared which is 4 3 squared vs. 9 16 25 and so on now 25 is the last value from our result so what if I was to run next one more time well if I do that you can see that I got an error here and the exception that it through was stop iteration and that means that the entire generator has been exhausted and stop iteration just means that it’s out of values now instead of getting these values one at a time we still can use a for loop on these generators and this is personally how I use generators a lot of the time so let me comment out this line and then let me uncomment that one and save it so now we’re saying for numb and my nums which my nums is our generator print out numb so I’ll run that and you can see that we get all of our values and we don’t get the stop iteration exception because the for loop knows when to stop before that happens so one immediate advantage over a list is that I think that this is much more readable here rather than having the result set to an empty list and then appending to that result and then returning the result this is kind of more readable we’re saying okay I’m passing in these numbers for each number and that list of numbers yield the result now for those of you more familiar with Python you might have noticed that this entire process here of these lines of code would have been much easier to write as a list comprehension so let me comment this out and if you don’t know what a list comprehension is don’t worry about it too much I just want to show the generator example with this as well now this is a list comprehension here and it’s going to do exactly what our square numbers function did so what we’re doing is we’re creating a list and we are taking x times x so the square of x for x in this list of 1 2 3 4 5 so if i save this here and run the code you can see that i still get the same results and i can also print out this list up here at the top now you can create a generator in the same way and it’s just as easy as taking out these brackets and instead putting in parenthesis so if I take out those brackets put in parenthesis now if I run this then you can see that when I printed my nums here tried to print it all at once I got that generator object and then when I ran my for loop it loop through all the values and gave me that result ok so what if you wanted to actually print out all of the values from the generator well like I said they’re not currently all held in memory but you can convert it to a list and it’s just as easy as just putting lists and then wrapping that and then if I run that you can see that it run it that it printed it out just as if it was a list now when you convert this generator to a list then you do lose the advantages that you gained in terms of performance and I haven’t talked about performance yet but I have a better example to show those advantages so a generator is better with performance because like I said it’s not holding all the values in memory which isn’t a big deal at all whenever you have a small list like this of 1 2 3 4 5 but say that you had tens of thousands or even millions of items to loop through then having that many items in memory will definitely be noticeable but you don’t get that with generators so whenever you cast a generator to a list like this if this generator had a lot of values that needed to convert to that list then you lose that performance in terms of it would put all of those values into memory so let me show you a better example here of this performance difference so I have a file here where some of this stuff you don’t have to worry about like these lines here I’m just printing out the memory and then these names and majors I’m just these are just going to be used to make some random values so I have two different functions here one of these is going to make a list and one of these is going to be a generator and they’re both returning the same values so within this list I have my result here and I’m looping through a number of people that I’m going to pass to this function and for each person I’m just going to make a person dictionary give it a an ID and a name that’s randomly chosen from the list of names up a top and a major that’s randomly chosen from the list of majors and then I’m going to return that result and for the generator it’s the exact same thing I’m going to loop through the number of people that I pass in and then I’m going to yield this person dictionary that has the same values as the list function head now really quick just to make these the same I’m going to make that an X range instead so that these are exactly the same okay so right here don’t worry about these lines here this time clock and this t2 time clock all I’m going to do is time how long it takes to run this function which returns a list now I’m going to pass in 1 million values to this function so it should return a list of 1 million results and then down here at the bottom I’m printing out the memory usage and the total time that it took so if I run this then you can see up here at the top of the code so this before here this is before I made anything so my base memory usage was around 15 megabytes and this memory after is after I created that list of 1 million records so you can see here that it jumped up by nearly 300 megabytes and it took one point two seconds now if you’re dealing with large amounts of data you know that’s not out of the ordinary to have 1 million records like that so let’s see what this looks like if I instead use the generator example so I’m going to going to comment out the the function that returned the list and now I’m going to uncomment this function that returns a generator and I’m going to pass in the same number of values I’m going to pass in 1 million values here so if I save that and run it now you can see here after I ran this that the memory is almost exactly the same and that’s because the generator hasn’t actually done anything yet it’s not holding those million values in memory it’s waiting for me to grab the next one or to loop through those and it would give me those one at a time now this time that it took here basically it didn’t take any time because as soon as it gets to the first yield statement it stops so if I was instead to make this an integer then it would be nearly zero seconds now whenever I said earlier that if you convert this to a list then you lose that performance then let me show you what I mean here so I will convert this result this entire result to a list and now if I run this then you can see basically I got pretty much the same result that I got when I ran the function that returned the list so if I take these back off and just do the generator then you can see that we get our performance back so that’s how you use a generator you know I think that it is a little bit more readable and it also gives you big performance boosts not only with execution time but with memory as well and you can still use all of the comprehensions and this generator expression here so you don’t lose anything in that area so those are a few reasons why you would use generators and also some of the advantages that come along with that so I hope this video is useful for you guys if you do have any questions about this stuff just ask in the comment section below be sure to subscribe for future Python videos and thank you guys for watching

100 thoughts on “Python Tutorial: Generators – How to use them and the benefits you receive”

  1. How could we create a generator that calls another generator? Im using glob.iglob to load some files and im having this kind of problem…Any ideas ?

  2. Thank you very much for your video. Would you please make some video about recursive generator and how does rellay works?

  3. Right now helped me even better than the video from the udacity course im doin right now or to say it in another way: it was a really worthy addition! Thx 🙂

  4. I always enjoy watching your video. Great work!!!

    Two Comments on your outdated code that I passed through:

    1. Instead of time.clock(), time.time() is recommended according to my web surfing.

    2. Your memory_usage_psutil() function seems outdated. I did some web search and I found one solution from stackoverflow. Unfortunately I lost ulr since I had to shut down my jupyter notebook. But here is what I found essentially:

    import os
    import psutil

    def memory_usage_psutil():
    pid = os.getpid()
    py = psutil.Process(pid)
    return py.memory_info()[0]/2.**30 # memory use in GB

    This works.

  5. first of all thank you sir . Your videos are like water to a thirsty . Sir i have a doubt . How can i access a certain persons data(using id) via a generator ?

  6. I read the same topic once in my native language and I didn't get it. XD
    But your tutorial explained it clearly, thank you. 🙂

  7. This was very very helpful, was going through other tutorials about generators and none of em actually helped me understand this concept easier.. thanks a lot Corey… 🙂

  8. please make a series on web scraping using scrapy module in python… we are waiting and expecting this from this channel. @CoreySchafer

  9. In Python3 range now acts like xrange? Are there any other distinctions between the versions in this example other than print()? Great video as always!

  10. In a nutshell I can see the benefit of saving memory, dealing with large data sets that would not fit into memory. As for processing power it will be the same when you have to loop trough the generator. Very well explained.

  11. Corey, the module "mem_profile" was in standard library? Well, it seems it no longer exist. Do you know some equivalent module equally simple and easy to use?

    BTW, you are a very good teacher, congratulations! It's hard to find in your field someone so capable to teach, with the right sensibilibity. Most of people think if they know something, they can teach it as well and it's not the case.

  12. @Corey Schafer I tried to the same things as you do but ' resource ' cannot be imported . How can I fix this problem ? I'm using Windows 10 .

  13. Thank you, great video! Most documentation on the web seems about using generators with lists. How about Pandas DataFrames, would there be any thinkable usecases or would one rather use other solutions such as apply, itertools?

  14. Hi Corey,
    So basically generators are used for the reduction of time complexity and memory management or if there are any uses please make a video

    Thanks

  15. A generator is not being used to boost the performance of your program.
    The difference is that a regular loop first processes all the data and RETURNS ALL THE DATA AT ONCE. Imagine you have million pieces of data which takes 20 minutes to load! Thats a long long time before you can access any of the information and work with it.
    Now imagine having the same set of data but instead of a regular loop we use a generator. A generator DOES NOT RETURN ALL THE DATA AT ONCE. it returns one piece of the data which you can work with. And then it returns the next piece of the data and you can work with that.
    So instead of having to wait 20 Minutes to load everything before you can work with the data, you break down the data into smaller data pieces which you can work with as soon as you start the program.

  16. @Corey Schafer. Can you make a video on "yield from" syntax. That would be the real deal breaker. "yield" is easy to understand. I cannot bend my mind around "yield from".

  17. Bro, you are the man. Every time I get stuck on a concept, I search for it and your videos pop up with with such a great explanation with a deep understanding. Thank you.

  18. @corey Schefer: Thanks for this video. It was simply great. I just wanted to see the code that you have written in mem_profile module. as I need to print that memory usage in my code as well.
    Thanks

  19. @Corey: You present these in a clear, understandable fashion with plenty of proof to demonstrate the huge performance boost that generators make possible. This really helps these concepts to sink in, and helps me to understand how I can implement this concept into my own code. This is awesome, thanks!

  20. Doesn't it store the 1st value in memory and wait for the next process? That's why the memory increased from 15.98 to 15.99MB. Please confirm. Thanks!

  21. The execution time is disingenuous. With the generator, you've just delayed the calculation to another segment of code. A true example of the performance metrics would show the utilization of the results as well.

  22. thanks for the video, just one thing in the 2nd example generator vs list 1000000 case, the generator takes considerable less time because you are not executing it, if it executes it will probably take the same time. So memory utilization is the only benefit

  23. its all python 2.x though, I was coding python3.6 and the examples in the video did not work and puzzled me very much until I ran python 2. Anyone know why 2 is still in use? Legacy code?

  24. It seems that mem-profile doesn't exist anymore. If I'm not mistaken, the alternative today is to use memory-profiler which can be found here: https://pypi.org/project/memory-profiler/
    Please correct me if I'm wrong.

  25. So if the performance gained by using generators is lost as soon as you convert it to a list or iterate over it, why even bother with generators?

  26. as non python guy, it is weird to me why the function you used to implement generator did not need to return anything, and how does python still know that it returns something

  27. for all those who are still thinking what did i just watch.
    Check this life saver:
    http://book.pythontips.com/en/latest/generators.html or

    https://github.com/yasoob/intermediatePython/blob/master/generators.rst

  28. I don't get what you said from 1:30 onwards regarding the line:

    print my_nums

    If what you said that 'yield' only returns one result at a time, then why didn't the line:

    print my_nums

    show '1' on the console log?

    And regarding the line:

    my_nums = square_numbers([1,2,3,4,5])

    Does the program interpret this as: The variable 'my_nums' is assigned the value '[1,4,9,16,25]'??? I.e the resulting list is stored in the variable 'my_nums'???

  29. hey I love your video and have a question, i can see the performance boost in the last example but what if you loop over them and compute some action? would it still have advantages in performance? my guess is that the moment you loop over a generator it will take same amount of mem as list and therefore it will perform the same. does it?

  30. The music at 11:10 reminds me of Ross playing keyboard lol anyways great piece of work man. Truly appreciate it.

  31. Shouldn't you test the generator with a for loop with 1000000 loops to get a real estimate of the performance?

Leave a Reply

Your email address will not be published. Required fields are marked *