Is there a way to compute XPath from a known value in a web page?
As I understand it, XPath is a way to navigate through elements in an XML document.
Direction is an XPath → element.
How do you go the other way? That is, calculate the XPath from the known value of the element?
For example, how would you find the xpath of the "faq" link in the stackoverflow header?
The language is not that important, I'm more interested in the algorithm and / or libraries / methods that will help me compute XPath.
a source to share
Here's a simple JS function. It only uses previousSibling, nodeType and parentNode, so it should be portable to other languages. However, the result is not readable (to humans) and it won't be particularly persistent across page changes.
In my experience XPath is more useful when written by hand. However, you can make an algorithm that generates prettier (if possible slower) results.
function getXPath(node)
{
if(node == document)
return "/";
var xpath = "";
while (node != null && node.nodeType != Node.DOCUMENT_NODE)
{
print(node.nodeType);
var pos = 1, prev;
while ((prev = node.previousSibling) != null)
{
node = prev
pos++;
}
xpath = "/node()[" + pos + "]" + xpath;
node = node.parentNode;
}
return xpath;
}
a source to share
You have not specified a language that makes it difficult to answer this question. Python lxml module can do this
>>> a = etree.Element("a")
>>> b = etree.SubElement(a, "b")
>>> c = etree.SubElement(a, "c")
>>> d1 = etree.SubElement(c, "d")
>>> d2 = etree.SubElement(c, "d")
>>> tree = etree.ElementTree(c)
>>> print(tree.getpath(d2))
/c/d[2]
>>> tree.xpath(tree.getpath(d2)) == [d2]
True
Even if you are not using python, you can find what you need in the module's source code
a source to share