Oct
23
2007
Fetching title from HTML page with C#
Posted by admin under
.NET
This article shows you how to retrieve the text inside <title></title> HTML tags from a remote web page.
Here's how the GUI looks like:

You jnust enter the web address in the Url textbox, presses the button and the webpage title comes out in the box below.
While you can download the solution and have a look yourself I'll explain some of the code here - it's not many rows at all:
private void button1_Click(object sender, EventArgs e)
{
//Fetched the title...
string sFullHtml = WebHelper.FetchHTML(textBox1.Text);
string sTitle = WebHelper.FetchTitleFromHTML(sFullHtml);
textBox2.Text = sTitle;
}
Meaning we have a helper file with all the code actually doing anything:
public class WebHelper
{
public static string FetchHTML(string sUrl)
{
System.Net.WebClient oClient = new System.Net.WebClient();
return oClient.DownloadString(sUrl);
//return new System.Text.UTF8Encoding().GetString(oClient.DownloadData(sUrl));
}
public static string FetchTitleFromHTML(string sHtml)
{
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
System.Text.RegularExpressions.Regex ex = new System.Text.RegularExpressions.Regex(regex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
return ex.Match(sHtml).Value.Trim();
}
}
And that's all there's to it! Sweet isn't it? Thanks to the WebClient class and the Regular expressions dito we are able to accomplish this task in just a few lines. Sometimes even I like frameworks.
Attachments