Next lesson playing in 5 seconds

Cancel
  • Overview
  • Transcript

4.1 Parse the Documents

Our documents will be in a proprietary format. In this lesson, we'll parse them so they can be displayed in the browser.

4.1 Parse the Documents

Well, we have a working web hook, or at least as far as our tests are concerned. So we need to take these files that we have downloaded from GitHub, and we need to process them, because they are going to be in a format that really isn't something that most end users will be able to read. It's going to be in markdown. But there's also going to be some other pieces within the file. I have chosen a file format very similar to what you would find with Jeckyll. Now if you're not familiar with Jekyll, it is a static site generator written in Ruby, and it's what GitHub uses for their GitHub pages feature. So the files that we are going to have are going to be very similar in format. There's going to be a few differences and I'm not going to point those out because it's really not important. But as far as the names of our files are concerned, it's going to be in the date first. It's going to be year, month and then day, and then the title of whatever this document is. So this is a test. And it is going to be markeddown. So we have at the top what's called front matter. And the front matter is in between a set of three hyphens. And the front matter is nothing more than metadata about the documents. So we would have a title, this is a test, and then we would have the date. Which once again would be year, month, and day. Now at some other time, we might want to put in the actual time here. But, for right now, we're just going to stick with the title and the date and then anything after the front matter would be the content of the message or the post, in this case. So we will have one paragraph, and then this is another paragraph, and we'll just leave it as is. The first thing we need is a class that's going to represent an individual post. Let's go to our Models folder and let's add a new class simply called Post. We need the Title and the Date. So the Title is a string. And let's go ahead and let's set this with a private setter. And we will do the same thing for the Date as well. We'll just say DateTime and, for this, it'll be PublishDate. And, once again, a private setter. And then we also need a content, so that is a string. We'll call it Content. And just like the other properties, this will have a private setter as well. Now the reason why I'm choosing to make these private is because we aren't going to create these Post objects with a constructor. Instead we're going to use a static method that will return a Post, so public static Post, and we'll call this Parse. And then we'll just have some fileData that we will then have to parse in order to have our resulting Post. So the idea is that we're going to read a file, we're going to send those contents to this Parse method. So, the first thing we really need to do is go line by line and find out what we have for our front matter. So, let's break this down according to their individual lines. So we're going to say fileData and we're going to split on the new line. So let's have a new string array. We'll use Environment.NewLine as our split and we also need to specify the options here. And we'll just have None. We don't want to remove any empty items, because we want to preserve all of the whitespace that we can. So we have the lines. And we're going to store all of the metadata in a dictionary, so let's just have metadata equals new Dictionary of string and string. The keys are going to be like title and date and the values are going to be the values there. So, as we process our lines, then we can grab that metadata and then we can also grab the content. So, the first thing we need to do is check to see if we have any lines. Because if we don't, then we don't really have a file to process. And we also want to ensure that there is at least some metadata here. So, if the first line in our document is not three dashes, then we don't have a post. So we will simply return null. The next thing we want to do is now iterate over all of our lines and we want to grab the metadata as we are reading each line. And then we will stop whenever we reach the second set of hyphens. So, let's create a counter variable and we're going to initialize this as 1, because we've already read the first line, so there's no need in starting with the first line. We'll start with the second line and we'll use a for loop. Now, since we have initialized our counter, we don't need that portion of the for loop iteration or the initialization, I should say. So now we want our lines.Length is going to be our limit, and then we are going to increment our ii variable. Now, I use ii instead of just i because if I ever need to search for an iterator, or rather a counter like this, if you just search for i, you're going to have a lot of hits. But if you do a search for just ii, then, well, you only have wherever you have used it within a loop. All right, so let's add an if statement. If lines at this line is equal to three hyphens, then we've reached the end of our front matter, so we will just break. Otherwise we want to take that line, split it by the colon, and then grab the key and then the value. So we will have our parts equals and we will split this line by the colon, and then we will take the first part and make that the key. So, metadata.Add parts at the zero index. Let's go ahead and Trim this, so that we won't have any white space there. And we will do the same thing for the second part. And that will give us our metadata. The next thing we need is the content of this document. And we already know where that starts, because we now have the line as to where the delimiter for the front matter is done with. So we can save var Content equals and we're going to get an array of all of the lines and we are going to join it back into a single string and the separator is going to be a new line. So we will use Environments.NewLine and we are going to Skip all of the lines that we have already read, so we're going to use our counter here, but we need the next line and then we will just call ToArray. That will give us our content, so that then we can return a new Post and we will set the Title equal to our metadata, with the index of title. We would do the same for the PublishDate, except that we want to parse that with DateTime. So we will call DateTime.Parse, we will pass in metadata and then date. And then finally, we set Content equal to content. Now, our content right now is still in markdown. So we need to convert that Into HTML. And to do that, we are going to use a third party component. Now unfortunately, it does not work with DNX Core. So this is the point where our application no longer will work on Linux or Mac OS. It is only going to work in Windows. Hopefully someday soon, we will have a markdown component that works with DNX Core. But until that time, we just have to use what we have. So, go to project.json in Solution Explorer and scroll on down. And we're going to add a dependency for dnx451, and we are going to get rid of this dnxcore50 because our code will not compile because we will have a component that is no longer compatible with DNX Core. Okay, so we want to add a property here called dependencies. And inside of dependencies, we are going to add one called Markdown. Now notice that we have IntelliSense right here, it's going to NuGet and it's pulling down anything that matches whatever we're typing. And right there we have Markdown. So that's what we want, and we want the latest version, which is 1.14.4, so that is what we are going to add. And Visual Studio is going to automatically pull that into our project so that we can then use it. So we can close this file. Let's go back to our Post and let's add a using statement for MartdownSharp. This is a library that is used at stack overflow. So we are going to take our content and convert it with MartdownSharp. So let's say var md equals new Markdown, and this will have a method called Transform, we want to transform our mark down. So down here wherever we set the Content, we're gong to say md.Transform, and then we will pass in our markdown content to the Transform method. And that will give us HTML. Now that we have the ability to parse these documents and represent them as a post, we need to be able to retrieve them as a post, so that we can display them in the browser,. And we will get started with that in the next lesson.

Back to the top