7 days of WordPress plugins, themes & templates - for free!* Unlimited asset downloads! Start 7-Day Free Trial

Next lesson playing in 5 seconds

  • Overview
  • Transcript

3.6 Adding a Real Feed

Let’s show off our new functionality by submitting a real news-feed URL, parsing the data, and giving the user a list of articles associated with that feed. This could be a lot of work if we started from scratch, but we are going to use a nice third-party library that will simplify the process quite a bit. Say “hello” to feedparser!

Related Links

3.6 Adding a Real Feed

In the previous lesson, we were able to make a little bit of headway where we were able to handle within our views the ability to figure out which method we're using from our request. In this case, post, so we could do different types of operations whether we are getting to this particular view via a post or a git or whatever have you. You learned how to do that. So now, it's time to actually start to input some interesting information into our application. And by interesting information, I mean, how do we actually put in there real feeds and get real articles to present to the user? And that's where we're really gonna start to dive into in this lesson. Now, there are several different ways of being able to do this. You can go out and build your own little parser and go and download some data off the internet. But man, that's gonna take some time. Why not just use some functionality or maybe a third party library that has already done some of that leg work for you and that's exactly what we're gonna do. So, we're gonna head over to Chrome and I'm gonna introduce you to a little friend of mine called feedparser. Now this is a package that you can find within Python, that you can download and install and the nice thing about this is it is a universal feedparser. So if you don't know anything about syndication feeds and RSS and Atom and all that kind of stuff, don't really worry about it. It's not necessary to fully understand it for this particular course, but it would be nice for you if you kinda had a basic understanding of the structure and how they are actually used. But once again, the nice thing about feedparser is that it handles a number of different formats, including RSS, CDF and Atom and it also supports different versions of those feeds. So, that's a very nice feature, so. What we're really gonna be able to do here, once we actually install this is simply run a single application or a single function called parse with a URL or a local file and it's gonna download all of that information, parse through all the data and then give it back to use in a nice little package. So, that's what we wanna do. Well, how do we get a hold of this feed parser? Well, you can definitely go down here and read the installation instructions and all that kinda good stuff. Or one of the nice features of the fact that we're using a virtual environment here Is that we're all kind of in a nice little sandbox and part of that sandbox that is installed for us, remember is PIP and that's what we use to install DJango before simply by typing pip install django. Well, we're gonna do the same thing within our virtual environment for feedparser. So I'm simply gonna say, pip install feedparser and it's gonna go out and it's going to download feedparser and install it for me within my virtual environment. Remember, that is denoted here by the open and closed parentheses. So now feedparser's installed and ready to be used, but the next question is how do we use it exactly? Well, that's pretty easy as well. So let's head over to our text editor and we're gonna be working in our views.py file. But, before we can actually use Feedparser we have to import it. So right now, our application in our file doesn't really know anything about it. So first, we have to import feedparser. Now that we've done that, we can now start to use it in a meaningful way. So we're gonna spend most of our time using it down here within the new_feed view with a views.py. So once again, all I need to do is I need to take some information, granted that is a URL that we got from somewhere. And that's gonna be from what we entered into are form on the new feed page and then I'm going to send a request out to feed parser to go get that information, parse it and then I'm just gonna use what I need. So the first thing that I'm gonna do is I'm gonna come down here after I have created my feed, because I need to grab that URL. So I'm simply gonna say, my feed data is gonna be equal to feedparser.parse is the function that we're going to use and we're wanna use feed.url. So remember, that's that piece of information that we sent into that text box into our form. So now we've got that in there and if this all runs correctly, feed data is going to contain all of the information associated with that feed. So now, I can just start to pick things apart. So the first thing that I wanna do is I don't wanna just use this bogus title here anymore, I wanna use an actual title. So, I'm gonna use feedData.feed is the property that contains all the parent information about that particular feed and then I wanna grab its title. Simple as that. So once I've done that, I'll go ahead and save my feed, because I guess that's enough information for now. And then we wanna start to go through all the articles or entries, that are associated with that feed. So once again, that's pretty simple. We're just gonna do a simple loop here. We'll say, for entry in feedData.entries. So, we're gonna loop to each one of these entries and we're gonna create a new article for each ones. We'll say, article = Article() like that, then we're just gonna populate these fields again. We'll say, article.title is gonna be equal to entry.title. Now you can look of the documentation for feedparser and figure out what are the properties that you need to use, I'm just gonna show you the ones that we have specified so far within our application. So, article.url is gonna be equal to entry.link. So the naming convention is a little bit different there, then we're gonna have article.description is gonna be equal to entry.description. And then we're also gonna do our publication date here, but this is gonna take a little bit more work. So, I'm gonna show you this in just a section. We'll come back to that and then we're going to set the feed property, so we have that parent child relationship, that's gonna be equal to feed. And then at the end, we're simply going to do article.save(), so that will actually retain that information within our database. So now, we need to deal with the publication date and that's gonna get a little bit messy here. So I just wanna show you very quickly what this is and we're not gonna go into terrible detail about this, but the problem that we're gonna run into is that there is a property on the entry called published. Now published is actually a string field, it's kind of a free formed string field, because the formats of the publication date fields within different syndication types whether it's RSS or Atom or CDF coupled with the fact that the people who are generating these date times can range in format all over the map. So really what published is just the raw form of what the publication date was on the actual feed. In this case, the article. But what we wanna do is we wanna make a little more sense in that and there's a little bit that we can do to kind of do a little bit of formatting, so that when we try to save this information into our article that Python and Django are gonna handle some of that formatting for us. But we gotta put it into at least a decent format to get started. So what we're gonna do here is instead of using published, there's another property called publish_parsed. And what published parsed is gonna do for us is, it's actually, the feed parser way of actually parsing this data into a usable date field. So, we can actually do something with it. So, it's making an attempt to take all these different formats all over the place and get it down into something we can actually work with. So, what I want to do here, is I actually want to do a little bit of date time conversion here. So I'm gonna say, datetime.datetime. This is a nice little function on the datetime that will do a little bit of converting for us, but that doesn't come for free. We actually have to import this. We have to import datetime. And then once we've done that, the published_parse is actually a full nine tuple or nine tuple, depending on how you look at or how you wanna say it, format of the datetime. Which means it's the full datetime broken out into nine distinct pieces, ranging anywhere from the year and date and month all the way out to seconds and milliseconds and all that good stuff, which we technically don't really need. So, we're gonna kinda pick this apart just a little bit and we're only gonna grab the first six pieces like this. So really, don't worry too much about it. You can kind of get around this. You probably don't necessarily have to do it, but I think it just is a way to kind of clean this up just a little bit. So once we've done that, I'm actually gonna take that datetime and put it back into a string. So I'm gonna create a dateString here and what I want to do is I'm gonna give this just a little bit of formatting. We're going to call a string format time function here on my datetime and I'm gonna give this a little bit of formatting, so that it works well with the data types that we've chosen so far. And it's gonna be a year hyphen in a month and then that's gonna be another hyphen day with a space and we're going to deal with the hour, the minute and the second like that. So, that's just a little bit of formatting to kinda get it into something a little bit easier to work with. This is gonna be held within our date string and then what I can do is I can go into article and I can say that the publication_date is gonna be equal to that dateString. And then Django and Python will just kinda take care of the rest of that formatting for me, so I don't have to worry about it too much and that's pretty much it. That's how we're gonna handle taking the information from the internet via feedparser, picking out the pieces of information that we want and need and then putting that into our application and our database. So, I'm gonna save that. Now, let's come back over here to Chrome. And actually, we're gonna head back over to our terminal first, because we're going to need to run our server. So, we've got that up and running. So now we can go back over to Chrome and what we're gonna need now is an actual feed that we can use, something out in the internet that we can use to pull that on and parse out all of that information. Well, if you head over to Django project and if you take a look at the web blog URL, just come down to the right hand side, you're gonna see latest news entries under RSS Feeds. And if you click on this, this is the actual raw feed of what's going on, on the Django project website. Now this is exactly what you would have to parse through and deal with on your own, if you weren't using something like feedparser. Now, I also wanna come back over to our admin side and take a look at the information here. So just so you recall, we have one feed object here and we also have one article here. So remember, those are the things I created manually within the administration's site and we'll come back to that and see that in just a minute. But what I wanna do is I wanna grab this URL, so I'm gonna copy this and I'm gonna come to our News Aggregator and I'm going to select New Feed. I'm gonna come in here and I'm gonna paste that URL in there and then I'm gonna hit Save. Now after I hit Save, you're gonna see here that we were forwarded back to our News Feeds page just like our view was telling us to do. And now instead of just having my blog here, we now have the Django web blog. So, it looks like this actually worked. And so what that is showing you is that it actually parsed out the information and took that title from feed data, and stuck it into the title of the feed. Now if I come back to home, now I'm starting to see more things outside of just my article and we're using the actual names of those articles. So, that's a pretty sweet little trick we've done here. Now to just further verify that the information is there. We'll come back over here and we're gonna refresh our feeds and you'll see that we have two feed objects in here. Now this is definitely a problem and a little messy and I'm gonna show you in just a few moments how we're gonna fix that, but we're not too worried about it just yet. But if we come and take a look at the information here, we're gonna see that we have a title in here just like we saw on our web page and then we have the URL. Now, I can also come back to news and take a look at Articles. And now I have a bunch more in here, so I can start to pick through these a little bit. And we're gonna see here that we have bug fix release and some information about it and all that sort of good stuff, but we're also gonna see in here that this is gonna get very confusing very fast. Because once again, we're talking about these feeds here and they're associated with the same named feed object. So that's gonna become a problem, but at least for the time being, we're able to get that information off the internet and put into our database. So I could even come onto my page here and I could click on these links, and it's gonna take me to the actual page to give you more information. So we're gonna have a little bit of formatting in an upcoming lesson to make this a little bit nicer, but you've actually gone to a point where you've able to download news and present it to the end user. So now in the next lesson, we're going to address this little ugliness here to make sure that's a little bit easier to understand what these objects are.

Back to the top