.NET regex inner text between td, span, tag
<table >
<tr>
<td colspan="2" style="height: 14px">
tdtext1
<a>hyperlinktext1<a/>
</td>
</tr>
<tr>
<td>
tdtext2
</td>
<td>
<span>spantext1</span>
</td>
</tr>
</table>
This is my sample text. How to write a regular expression in C # for matches to the internal text td
, span
, hyperlinks.
+2
a source to share
3 answers
I compress every time I hear the words regex and HTML in the same sentence. I would suggest checking out the HtmlAgilityPack on CodePlex, which is a very tolerant HTML parser that allows you to use XPath queries against the parsed document. This is much cleaner and the person who inherits your code will thank you!
EDIT
As per the comments below, here are some examples of how to get the InnerText of these tags. Very simple.
var doc = new HtmlDocument();
doc.LoadHtml("...your sample html...");
// all <td> tags in the document
foreach (HtmlNode td in doc.DocumentNode.SelectNodes("//td")) {
Console.WriteLine(td.InnerText);
}
// all <span> tags in the document
foreach (HtmlNode span in doc.DocumentNode.SelectNodes("//span")) {
Console.WriteLine(span.InnerText);
}
// all <a> tags in the document
foreach (HtmlNode a in doc.DocumentNode.SelectNodes("//a")) {
Console.WriteLine(a.InnerText);
}
+6
a source to share
static void Main(string[] args)
{
//...
// using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
// {
HtmlDocument doc = new HtmlWeb().Load("http://www.freeclup.com");
foreach (HtmlNode span in doc.DocumentNode.SelectNodes("//span"))
{
Console.WriteLine(span.InnerText);
}
Console.ReadKey();
// }
}
0
a source to share
You can use something like:
const string pattern = @"[a|span|td]>\s*?(?<text>\w+?)\s*?</\w+>";
Regex regex = new Regex(pattern, RegexOptions.Singleline);
MatchCollection m = regex.Matches(x);
List<string> list = new List<string>();
foreach (Match match in m)
{
list.Add(match.Groups["text"].Value);
}
-2
a source to share