Regular expression pattern for spaces
I am creating a regex library to work with HTML (I will post it in MSDN code when done). One of the methods removes any spaces before the closing tag.
<p>See the dog run </p>
This will remove the space before the closing paragraph. I am using this:
public static string RemoveWhiteSpaceBeforeClosingTag(string text)
{
string pattern = @"(\s+)(?:</)";
return Regex.Replace(text, pattern, "</", Singleline | IgnoreCase);
}
As you can see, I am replacing spaces with </ as I cannot just match the space and exclude the closing tag. I know there is a way - I just didn't get it.
+1
a source to share
2 answers
\s+(?=</)
is your expression. This means that one or more space characters followed by
-
(?=...)
is a positive outlook . This will not be included in the expression; -
(?:...)
is a non-exciting group . This will be included in the expression.
All that said, regular expressions are a vulnerable and error-prone way of handling HTML, so it should be used with caution if at all possible.
+11
a source to share