Unlimited Plugins, WordPress themes, videos & courses! Unlimited asset downloads! From $16.50/m
Data python 3
  • Overview
  • Transcript

2.7 Generators

As I mentioned earlier, Python is very powerful, but regardless of how powerful a language it is, it will always be limited by the power of its host machine. What happens if you need to process hundreds, thousands, or even millions of records in a collection? Odds are you can't store all of them in memory at the same time. So what do you do? Generators to the rescue!

2.7 Generators

So to this point we've been talking about handling data and how we can use collections of data. How we can process collections of data and that's all worked out pretty well so far. But what happens when you start to deal with very large collections of data? Because let's be honest, in the real world in a lot of applications in a lot of businesses you're dealing with very large amounts of data, large collections of data. And we are always, as developers, we are always limited by the platforms and the systems and the hardware that we're running on. So you can often times run into issues and problems when you start to process large amounts of data. Because you tend to run out of disk space or run out of memory, or things like that. So we always have to be cognizant of how we're handling those scenarios where we're dealing with very large amounts of data. So let's work on an example and let me show you how you can solve it in a Python sort of way. So let's use a very common example, let's talk about the Fibonacci Sequence. Now, if you're not familiar with Fibonacci, it's basically the concept of, you start with two ones, and then you add those two together and you get two. So the first three numbers in the Fibonacci sequence are one, one and two. And then as you move along you add together the last two numbers to get the next one. So it would look like one and one is the beginning and then you add one and one together you get two. And then you add one and two together, you get three. Add two and three together you get five, eight, thirteen, so on and so forth. And this is one of those infinite sequences, it never ends. It continues to go on as long as we're dealing with numbers. So let's write a simple function that is going to take in how many numbers of the Fibonacci sequence you want to figure out. And then it will process and return a result. So let's go ahead and define a function. We'll call it fibon or however you would like to call it. And it's going to take it in value and so that's going to be how many numbers of the Fibonacci sequence we want to find. So it could take in whatever number you'd like, well some integer value. So in order to solve this we're going to have deal with two variables we'll say a and b. And they're both going to be equal to one. So that's where we're going to start this for we're going to get those two first ones. And then, what I wanna do is I want to have a loop. And I wanna say for some value i in the range up to n, or it could be n plus 1 depending on if you want to be inclusive or exclusive. It doesn't really matter for this example but let's just keep it simple for this. And then every time it does that we're going to need a result here. So let's say the result is going to be some sort of list. So then what I wanna do is I want to go into results and I want to append to that result. I want to append a and then after I do that I want to modify a and b. So a and b are now going to be equal to a is going to be equal to b, and b is gonna be equal to a plus b. And then once that's all done, we're simply going to return our result. So it's not a very complicated function here, it's pretty simple, so let's go ahead and give this a test. So what I'd like to do is I would like to just do a simple print. And I want to print out the results of my Fibonacci function, let's say let's do the first ten. So it's going save that, we'll pop over to our console here and let's go ahead and run that. So there you go we have 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, and those are correct and everything seems to work just fine. And let's go ahead and maybe up our game a little bit. Let's say, let's do 50, let's save that. So let's do the first 50, okay, that's pretty good. As you can see, the numbers get fairly large, fairly quickly. Okay, well 50's pretty good. How about 100, so let's say that. Let's run that guy, all right. We're getting a little bit bigger. Well let's get a little bit crazier. And let's say that's 100, that's 1,000, 10,000, 100,000. Let's do the first 100,000. So let's clear our screen here and let's go ahead and run this. And you're going to see this is going to start to take quite a lot of time. And depending on the resources, the amount of memory that you have on your system this could take an awfully long time. And worse case you could actually run out of memory and this might never finish. And that's a big problem is how can we get around this limit of running into this problem. Where we're trying to run a particular function or deal with a lot of data, or store a lot of data, or do something with these collections. That just become too big to be able to contain all of it in memory at the same time. So let's go and stop this now, we'll clear it out. How can we fix this? Well, in the world of Python we have this concept of a generator, so what's the generator? Well a generator is a function that instead of having a return value uses the yield keyword. So the yield keyword, what exactly is that? Well the yield is going to present the, or be the next value that gets processed within a collection of data that we're dealing with. So a generator is going to create an iterator that is something similar to a collection that you process one value at a time. So there's a special function called Next that is part of an iterator that it will call one at a time, some value .next. And instead of trying to store everything in memory at the same time, it's only going to process the next value, the next value, the next value. So you obviously can't just do a print like this. And I'll show you that in this example. You're going to have to use the resulting generator, the resulting results of that generator, in a for loop to actually get through all the values. But I'll show you how that's gonna work. So let's create a slightly different function here. We're going to do _g for generator, but it's going to be pretty much the same. We're going to start out the same way, we're going to have these two variables. A and B are both going to start off as one. We're not going to store result because at this point we don't need to store anything, but we'll say for i in range, once again up to n. And at this point instead of appending something to a resulting list or something like that, we are actually going to use the yield keyword and we're gonna yield a. Then once we've yielded a we'll go ahead and do our a ,b processing just like we did before, a equal to b and b equal to a plus b, and then we're done. So like I said when we're talking about a generator, a generator is really just a function that yields a value, an individual value in this case a. Instead of doing the all of the processing and storing everything in a single result list and then trying to return that. So now if I were to try to do the same thing, and use that generator function instead let's go ahead and save this. Let's go ahead and run this again. You're gonna see it it processes instantaneously even though I've said process the first 100,000 in the Fibonacci sequence, I got a result immediately. But, as you can see, I got a generator object. So, a generator object is an iterable collection in and of itself. So if I try to print it out, it's just gonna give me the fact that, hey, this is an object at this location in memory. So if I actually wanna do something with this, then I'm gonna need to actually print out each individual value. So let's modify this just a little bit. We'll say for x in and then we're going to create or use this function call like this, paste. So we're going to say for that because this is actually, this function is going to return an object that is iterable. So then each time through, like I said, there's a special function attached to this which is going to be called next. So the equivalent would be saying x.next like this to get the next value. And then printing out whatever value returns from that. But when you process it as an interval value like this like some sort of collection you don't have to use the next key word. I could simply say give me X. So let's go ahead and save that. So now if you recall, but just to show you that it still works the same way. Let's start off with ten again. Let's go and save that. Let's process ten of these, and you see we still get the same answer. 1, 1, 2, 3, all the way up to 55. So then if I were to modify this to be a 100, let's save that and run it again. I get the first 100 and it was done pretty much instantaneously. And if I really want it to get crazy again that's 100, 1,000, 10,000, 100,000 if I wanted to do a 100,000. Remember last time it just locked up and waited. Now this is not gonna lock up and wait, it's gonna process. But as you can see here it's go ahead and processing each individual number at a time. And not working to store them all in a list and constrain them all and try to put them all into memory at the same time. It's just going to process each individual value as it goes. And this is going to sit here and keep returning extremely large numbers for as long as I'm going to let it go until it finishes. But I'm going to stop it right here just to show you that you can process very large and even infinite numbers of values using generators within Python. Because like I said, it's not trying to do and process all of the stuff in one list all at the same time by storing it all in memory. It's going to yield individual results, just like this. And process them each one at a time using the next method or next function on the object that comes out. But like I said, in the world of collections, you don't have to specify next in this case if you are using it or iterating it through in a loop such as this. But now you can process incredibly large numbers of data without having to worry about running out of memory.

Back to the top