Regular expression pattern for spaces

I am creating a regex library to work with HTML (I will post it in MSDN code when done). One of the methods removes any spaces before the closing tag.

<p>See the dog run </p>

      

This will remove the space before the closing paragraph. I am using this:

    public static string RemoveWhiteSpaceBeforeClosingTag(string text)
    {
        string pattern = @"(\s+)(?:</)";
        return Regex.Replace(text, pattern, "</", Singleline | IgnoreCase);
    }

      

As you can see, I am replacing spaces with </ as I cannot just match the space and exclude the closing tag. I know there is a way - I just didn't get it.

+1


a source to share


2 answers


\s+(?=</)

      

is your expression. This means that one or more space characters followed by



  • (?=...)

    is a positive outlook . This will not be included in the expression;
  • (?:...)

    is a non-exciting group . This will be included in the expression.

All that said, regular expressions are a vulnerable and error-prone way of handling HTML, so it should be used with caution if at all possible.

+11


a source


You need a lookahead (? =) Pattern:

\s+(?=</)

      



This can be replaced with ""

+3


a source







All Articles