Advertisement
PHP

How to Syndicate Content Without Utilizing a News Feed

by

Many websites offer syndication formats such as RSS, JSON, or XML based services to allow for easy content delivery. But what happens when a website doesn’t offer one of these services? How do you syndicate content from a website that doesn’t offer a news feed? This is what I set out to solve.

I received a project lately from a client with an outline and brief of the website and the objectives they wish to accomplish. Along with this brief were notes indicating they were a real estate company and regularly posted property to a well known real estate website and wished to be able syndicate their content on this external site onto their own website without having to update both sites. The catch: This well known real estate site did not offer a syndication service or API for developers to access their listings.

Finished Project

Using JQuery’s load()

Visual Process Map 1

After scouring the internet I discovered that most solutions to this problem were inelegant and most of the time they were browser-specific or ineffective. I decided to code my own solution using the popular javascript library JQuery.

To access information from another website I needed to utilize the AJAX functions of the JQuery library.

  <script  src="http://code.jquery.com/jquery-latest.js"></script>
  <script type="text/javascript">
    $("document").ready(function()  {
      $("#content").load("http://net.tutsplus.com/”);
    });
  </script>

If you are familiar with JQuery the above shouldn’t be too difficult to understand. We are using the AJAX  load function to load a webpage’s content into an element with id #content. The solution seemed too easy but alas the problem, as you will soon realize, is that the code will only work in Internet Explorer 6 or 7. The reason for this soon became apparent – all other browsers block the loading in of websites from alternative domains due to local security settings. This meant we can only load relative pages not absolute URLs.

A Server-Side Solution

Visual Process Map 2

I looked around online for a solution to this problem and to my dismay, most people were either under the impression that it was not possible to bypass the local security settings of most browsers or it was too complicating a task so not worth doing. This is when I discovered the cURL library.

cURL is quite useful in that it allows you to communicate with other servers using URLs and standard web protocols such as HTTP, HTTPS or SSL. Using cURL I was able to build a bypass to our local security problem by using loading in the whole website to a local URL server-side.

  
  <?php
  $ch = curl_int(“http://net.tutsplus.com”);
  $html = curl_exec($ch);
  print “$html”;
  ?>

This code initiates cURL object from an external URL – the benefit being the URL is loaded on the server rather than on the client. The server security settings in the PHP environment are a lot more flexible than the local security settings of most modern browsers. After initiating the cURL object we simply print the whole contents of the URL. If we now save this document as ‘curl.php’ onto our web server we now have a local file that will load in the entire website contents of our external URL.

Let’s go back to our original code and put in our modifications:

  <script  src="http://code.jquery.com/jquery-latest.js"></script>
  <script type="text/javascript">
    $("document").ready(function()  {
      $("#content").load("curl.php”);
    });
  </script>

Our script now supports all browsers and isn’t accomplished using any unorthodox local security hacks.

Why use JQuery?

JQuery

Now you might wonder what are the advantages of working with this document in JQuery as compared to just manipulating our document using PHP? The main reason for my choice in using JQuery is the ability to use its CSS-styled selectors to choose what content on our page we actually want to syndicate, like the following:

  <script  src="http://code.jquery.com/jquery-latest.js"></script>
  <script type="text/javascript">
    $("document").ready(function()  {
      $("#content").load("curl.php #content”);
    });
  </script>

Rather than loading in the whole document we now just load in the contents of an element with id #content. We will get to the benefits of this later on in the article.

Images and Anchors

After playing around with this for a bit you may notice the next big problem. Although we have managed to syndicate an external sites content, all relative links and images are no longer working. Another reason for working in JQuery. Using the JQuery each() function we can create a loop that goes through all <a> and <img> elements grabbing the current HREF or SRC attribute and prepending the external domain onto it.

  <script  type="text/javascript">
  var domain = "http://www.google.com";
  $(document).ready(function(){
    $("a").each(function (i) {
      var href = $(this).attr('href');
      var new_href = domain + href;
      $(this).attr('href',new_href);
    });
  $("img").each(function (i) {     var src = $(this).attr('src');     var new_src = domain + src;     $(this).attr('src',new_src);   }); }); </script>

We first select all <a> elements and cycle through them extracting the href attribute and then prepending our chosen domain to it. We could also if we want add in an attribute to open all links in new windows, etc. Secondly we select all <img> elements and again cycle through them extracting the src attribute, etc.

Now the problem at this point we run into is where do we integrate our new code into our existing code? The problem I originally came across was no matter where you put it the external markup did not load quickly enough for our code to change the domain to come into effect after the fact. The solution involves combining the two into quite an elegant JQuery solution.

  $("document").ready(function()  {
     $("#content").load("curl.php #content",{},function(){
      $("a").each(function (i) {
        var  href = $(this).attr('href');
        var new_href = domain + href;
        $(this).attr('href',new_href);
      });
    $("img").each(function (i) {       var src = $(this).attr('src');       var new_src = domain + src;       $(this).attr('src',new_src);     });   }); });

The load function has two more properties it can take, one being variables you want to submit to your external URL. For example you could be trying to retrieve data from the results of a POST form. The other property being a callback function or what to do once the load() function has finished. In our case this is perfect – we place our code in the callback function which prevents it from running until we completely load in our external page.

Previews

As you can see now we are now able to simply pull into any element on our page content from another website. This is very practical
for not just syndicating content like news feeds but any dynamically updated content.

Styling Our Content

Now that we can have pulled in our content the next step shows the superiority in using this code over say an <iframe>. While an <iframe> solves many of messy issues with links, etc we went through above we are not able to seamlessly integrate it into a website with a completely different style. The content will essentially always be just a window into another website. As seen earlier when I first introduced the idea of using CSS-styled selectors in style sheet we can select any id or class or any selector by just placing it in the load() function:

  $("document").ready(function()  {
     $("#content").load("curl.php #content",{},function(){
  ...

In this case we are only selecting a <div> from the homepage of Net Tuts+ which happens to correspond with the main content <div> We are now syndicating just an extract of the page, not pulling in any of styles (as they are contained in the <head>) nor any effects (if they exist). We are only pulling in markup.

We are now going to add some styles to our page using CSS.

body,a {
  font-family: 'Tahoma';
  color: #fff;
  background-color: #000;
  font-size: 12px;
}
#content {
  width: 600px;
}
#content small, #content span, #content .more-link {
  display:none;
}
#content img {
  float:left; margin-right: 5px;
}
#content h1 {
font-size: 14px;
}

This CSS is more about demonstrating a few important features than being aesthetically appealing. A few important things to note at this point is that we have to remember to assign styles exactly to the tags we are looking at styling -- I.E. don't style all <small> tags - we only want to style the ones in the #content <div>. The second thing to note is what I've done to the <small>,<span> tags and .more-link class. Rather than displaying all the content we have syndicated it may be useful to hide some of it - we could even use that content in dropdown effect or something similar. We are instead hiding the tags completely using the display property. We use display rather than visibility for a reason - visibility still leaves the outline of where the content was. Display hides this completely.

Preview

Modify Images using JQuery

Another thing we can do to make our news syndicator take up less space on our screen is modify the images. This could be done using CSS but instead I want to demonstrating using JQuery to modify the source of the image.

We are going to modify our JQuery to use the attr() function to modify the source of our image to one on our own server - a nice, little link button.

...
  $("#content img").each(function (i) {
var src = $(this).attr('src');
var new_src = domain + src;
$(this).attr('href',new_src);
});
$("#content img").attr('src','link.png');
});
});

Now lets modify our CSS slightly to make our image float nicely to the left.

#content img { float:left; margin-right: 5px;}
Preview

Now, using only content syndicated from the Net Tuts+ homepage, we have managed to build a news syndicator with completely different styling to the original site.

Preview

Preloader

What you may notice when you use this code is that it takes a while for JQuery to process and load the external site. A nice feature to add is a loading bar to the #content <div> while we wait for the content to load.

The easiest way to make our loading bar is to place a loading bar image inside our #content <div> in our mark up. Our loading image will appear when the site first loads but once the JQuery has finished loading our external content it will replace the current content, being the loading bar, with our new content. A site I use quite often when generating loading bars is http://www.ajaxload.info/. It has a very decent generator for creating a variety of loading images.

...
<h1>My Content Syndication Service</h1> <div id="content"><img src="ajax-loader.gif" alt="Loading..." /></div>
...

We now have a nice little application which will show a preloading image until our content is ready to show.

Preview

While the preloader is a nice feature it isn't a replacement for optimised code. In this tutorial we are using JQuery to choose what elements we should select or not when in actual fact the most speed optimal solution would be to do that in our PHP code. This though, is outside the scope of this tutorial.

Conclusion

There we have it – a simple solution using JQuery’s AJAX functions and PHP’s cURL library that allows us to syndicate external content. This is a simple solution if you require content from an external website. As I have already stated, although JQuery's easy syntax and CSS-selectors give us the convenience of styling and selecting what we want from the client-side, this is not speed optimized. The best thing would be for us to remove the tags we don't want using Regular Expressions in PHP. I would also note one of the most common mistakes is being too specific when styling; remember you have no control over whether or not the content creator changes what tags and classes they use, it is always best to style general elements that will be commonly used.

Another thing worth taking into account is that this tutorial is meant to generate a content syndicator - it is not intended for use as a site content 'scraper'. If you are going to implement this in a commercial project, make sure you have the permission of the copyright holder to use the content on your page.


Related Posts