Register | Login

Question: Given a line of text, write a regular expression to strip all the HTML tags from it?


Answer:




public static string CleanHTML(string htmlPage)
{
return Regex.Replace(htmlPage, "<[^>]*>", string.Empty);
}


See the pattern? Just remember ([^>]*?) and ([">]*?)  expressions and use them if get a question about using RegEx for HTML parsing.

Phill Haack has really good post on parsing HTML with Regular expressions I strongly recommend to read.


Who Voted for this Question


Article



Common Interview is a place to help people keep up with the latest trends in job interviewing. You can interact by asking interview questions or by providing answers and ratings. Choose from thousands behavioural, technical, testing or program management questions and interview puzzles.