How to extract information from url block in php?

I have a list of urls that can be in any format. Each line, separated by commas, has random text in between, etc. URLs are all from two different sites and have a similar structure

For this example, let's say it looks like

Random Text - http://www.domain2.com/variable-value
Random Text 2 - http://www.domain1.com/variable-value, http://www.domain1.com/variable-value, http://www.domain1.com/variable-value

http://www.domain1.com/variable-value
http://www.domain2.com/variable-value
http://www.domain1.com/variable-value http://www.domain2.com/variable-value http://www.domain1.com/variable-value

      

I need to extract 2 pieces of information. Check if it is specified by domain1 or domain2 and the value following "variable -"

Therefore, it must create a multidimensional array that will have 2 elements: domain + value.

What's the best way to do this?

+2


a source to share


3 answers


This is the ability to extract URLs. The only problem is that URLs themselves cannot contain a comma. So if that's enough ...

$lines = explode('\n', $urls);

for($i = 0; $i < sizeof($lines); $i++)
{
    if(preg_match_all("http:\\/\\/[^,]*variable-([^,]+)", $lines[$i], $matches))
    {

    }
}

      

By the way ... matches are stored in an array $matches

.



PS: Edited ... I forgot to escape the backslash and you have to search for a string-to-string string to ensure correct behavior ... check the regex-tester.de/regex.html .. .it just worked with my regex.

PPS: After further research, I found this page: http://internet.ls-la.net/folklore/url-regexpr.html . It contains a regular expression for the url. You can use it to fetch the urls first, and in the second step, you can look at your urls and extract information about the variables, for example variable-([\W]+)

.

+1


a source


preg_split , preg_match , parse_url



// split urls
$urls = preg_split('!,\s+!', 'http://www.domain1.com/variable-value, http://www.domain2.com/variable-value, http://www.domain3.com/variable-value');

// check for domain and path variable
foreach ($urls as $url) {

    $parts = parse_url($url);
    // check domain: $parts['host'];
    $matches = array();
    // check path: preg_match('!^/variable-([^/]+)!', $parts['path'], $matches)
}

      

0


a source


$text = "http://www.domain1.com/variable-value1, http://www.domain2.com/variable-value2 http://www.domain1.com/variable-value3";
preg_match_all("/http:\\/\\/(.+?)\\/variable-([a-z0-9]+)/si", $text, $matches);
print_r($matches);

      

Result:

Array
(
    [0] => Array
        (
            [0] => http://www.domain1.com/variable-value1
            [1] => http://www.domain2.com/variable-value2
            [2] => http://www.domain1.com/variable-value3
        )

    [1] => Array
        (
            [0] => www.domain1.com
            [1] => www.domain2.com
            [2] => www.domain1.com
        )

    [2] => Array
        (
            [0] => value1
            [1] => value2
            [2] => value3
        )

)

      

0


a source







All Articles