PHP: preg_match_all

Описание

int preg_match_all ( string pattern, string subject, array &matches [, int flags [, int offset]] )

Ищет в строке subject все совпадения с шаблоном pattern и помещает результат в массив matches в порядке, определяемом комбинацией флагов flags.

После нахождения первого соответствия последующие поиски будут осуществляться не с начала строки, а от конца последнего найденного вхождения.

Дополнительный параметр flags может комбинировать следующие значения (необходимо понимать, что использование PREG_PATTERN_ORDER одновременно с PREG_SET_ORDER бессмысленно):

PREG_PATTERN_ORDER

Если этот флаг установлен, результат будет упорядочен следующим образом: элемент $matches[0] содержит массив полных вхождений шаблона, элемент $matches[1] содержит массив вхождений первой подмаски, и так далее.


<?php

preg_match_all("|<[^>]+>(.*)</[^>]+>|U", 

    "<b>example: </b><div align=left>this is a test</div>", 

    $out, PREG_PATTERN_ORDER);

echo $out[0][0] . ", " . $out[0][1] . "\n";

echo $out[1][0] . ", " . $out[1][1] . "\n";

?>

Результат работы примера:

<b>example: </b>, <div align=left>this is a test</div>
example: , this is a test

Как мы видим, $out[0] содержит массив полных вхождений шаблона, а элемент $out[1] содержит массив подстрок, содержащихся в тегах.

PREG_SET_ORDER

Если этот флаг установлен, результат будет упорядочен следующим образом: элемент $matches[0] содержит первый набор вхождений, элемент $matches[1] содержит второй набор вхождений, и так далее.


<?php

preg_match_all("|<[^>]+>(.*)</[^>]+>|U", 

    "<b>example: </b><div align=\"left\">this is a test</div>", 

    $out, PREG_SET_ORDER);

echo $out[0][0] . ", " . $out[0][1] . "\n";

echo $out[1][0] . ", " . $out[1][1] . "\n";

?>

Результат работы примера:

<b>example: </b>, example: 
<div align="left">this is a test</div>, this is a test

В таком случае массив $matches[0] содержит первый набор вхождений, а именно: элемент $matches[0][0] содержит первое вхождение всего шаблона, элемент $matches[0][1] содержит первое вхождение первой подмаски, и так далее. Аналогично массив $matches[1] содержит второй набор вхождений, и так для каждого найденного набора.

PREG_OFFSET_CAPTURE

В случае, если этот флаг указан, для каждой найденной подстроки будет указана ее позиция в исходной строке. Необходимо помнить, что этот флаг меняет формат возвращаемых данных: каждое вхождение возвращается в виде массива, в нулевом элементе которого содержится найденная подстрока, а в первом - смещение. Данный флаг доступен в PHP 4.3.0 и выше.

В случае, если никакой флаг не используется, по умолчанию используется PREG_PATTERN_ORDER.

Поиск осуществляется слева направо, с начала строки. Дополнительный параметр offset может быть использован для указания альтернативной начальной позиции для поиска. Дополнительный параметр offset доступен, начиная с PHP 4.3.3.

Замечание: Использование параметра offset не эквивалентно замене сопоставляемой строки выражением substr($subject, $offset) при вызове функции preg_match_all(), поскольку шаблон pattern может содержать такие условия как ^, $ или (?<=x). Вы можете найти соответствующие примеры в описании функции preg_match().

Возвращает количество найденных вхождений шаблона (может быть нулем) либо FALSE, если во время выполнения возникли какие-либо ошибки.

Пример 1. Получение всех телефонных номеров из текста.

<?php preg_match_all("/$? (\d{3})? $? (?(1) [\-\s] ) \d{3}-\d{4}/x", "Call 555-1212 or 1-800-555-1212", $phones); ?>

Пример 2. Жадный поиск совпадений с HTML-тэгами

<?php // Запись \\2 является примером использования ссылок на подмаски. // Она означает необходимость соответствия подстроки строке, зафиксированной // второй подмаской, в нашем примере это ([\w]+). // Дополнительный слеш необходим, так как используются двойные кавычки. $html = "bold text<a href=howdy.html>click me</a>"; preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches); for ($i=0; $i< count($matches[0]); $i++) { echo "matched: " . $matches[0][$i] . "\n"; echo "part 1: " . $matches[1][$i] . "\n"; echo "part 2: " . $matches[3][$i] . "\n"; echo "part 3: " . $matches[4][$i] . "\n\n"; } ?>
Результат работы примера:
matched: bold text part 1: part 2: bold text part 3: matched: <a href=howdy.html>click me</a> part 1: <a href=howdy.html> part 2: click me part 3: </a>

Смотрите также preg_match(), preg_replace(), и preg_split().

preg_match_all

mail at SPAMBUSTER at milianw dot de
17-Jul-2006 07:11


I refurnished connum at DONOTSPAMME dot googlemail dot com autoCloseTags function:

<?php

/**

 * close all open xhtml tags at the end of the string

 * 

 * @author Milian Wolff <http://milianw.de>

 * @param string $html

 * @return string

 */

function closetags($html){

  #put all opened tags into an array

  preg_match_all("#<([a-z]+)( .*)?(?!/)>#iU",$html,$result);

  $openedtags=$result[1];



  #put all closed tags into an array

  preg_match_all("#</([a-z]+)>#iU",$html,$result);

  $closedtags=$result[1];

  $len_opened = count($openedtags);

  # all tags are closed

  if(count($closedtags) == $len_opened){

    return $html;

  }

  $openedtags = array_reverse($openedtags);

  # close tags

  for($i=0;$i<$len_opened;$i++) {

    if (!in_array($openedtags[$i],$closedtags)){

      $html .= '</'.$openedtags[$i].'>';

    } else {

      unset($closedtags[array_search($openedtags[$i],$closedtags)]);

    }

  }

  return $html;

}

?>

volkank at developera dot com
07-Jul-2006 07:04


I will add some note about my last post.



Leading zeros in IP addresses can cause problems on both Windows and Linux, because one can be confused if it is decimal or octal (if octal not written properly)



"66.163.161.117" is in a decimal syntax but in "066.163.161.117" the first octet 066 is in octal syntax.

So "066.163.161.117" is recognized as  decimal "54.163.161.117" by the operating system.

BTW octal is alittle rare syntax so you may not want or need to match it.



***

Unless you specially want to match IP addresses including both decimal and octal syntax; you can use Chortos-2's pattern which is suitable for most conditions.



<?php 

//DECIMAL syntax IP match



//$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";

$num='(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])';



if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) //validate IP

...



preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match); //collect IP addresses from a text(notice that ^$ not present in pattern)

...



?> 



***

Also my previous pattern still have bug and needs some changes to correctly match both decimal and octal syntax.

connum at DONOTSPAMME dot googlemail dot com
03-Jun-2006 02:41


<?

function autoCloseTags($string) {

// automatically close HTML-Tags

// (usefull e.g. if you want to extract part of a blog entry or news as preview/teaser)

// coded by Constantin Gross <connum at googlemail dot com> / 3rd of June, 2006

// feel free to leave comments or to improve this function!



$donotclose=array('br','img','input'); //Tags that are not to be closed



//prepare vars and arrays

$tagstoclose='';

$tags=array();



//put all opened tags into an array

preg_match_all("/<(([A-Z]|[a-z]).*)(( )|(>))/isU",$string,$result);

$openedtags=$result[1];

$openedtags=array_reverse($openedtags); //this is just done so that the order of the closed tags in the end will be better



//put all closed tags into an array

preg_match_all("/<\/(([A-Z]|[a-z]).*)(( )|(>))/isU",$string,$result2);

$closedtags=$result2[1];



//look up which tags still have to be closed and put them in an array

for ($i=0;$i<count($openedtags);$i++) {

    if (in_array($openedtags[$i],$closedtags)) { unset($closedtags[array_search($openedtags[$i],$closedtags)]); }

        else array_push($tags, $openedtags[$i]);

}  



$tags=array_reverse($tags); //now this reversion is done again for a better order of close-tags



//prepare the close-tags for output

for($x=0;$x<count($tags);$x++) {

$add=strtolower(trim($tags[$x]));

if(!in_array($add,$donotclose)) $tagstoclose.='</'.$add.'>';

}



//and finally 

return $tagstoclose;

}

?>

slavomir dot hustaty at gmail dot com
28-Mar-2006 07:10


//<h1>some text</h1><b>bold</b><h1>some further text</h1>

//if needed what's between tags :-)



class find_regex 

{

    

    var $search_tag;

    var $result;

    //preg_match_all("/(<h1[^>]*>)([^<]*)(<\/h1>)/", $html, $matches);

    

    function find_regex($tag = "h1")

    {

        $this->search_tag = $tag;

    }

    

    function parse($text_to_parse = "")

    {

    

        $regex = "/(<" . $this->search_tag . "[^>]*>)([^<]*)(<\/" . $this->search_tag . ">)/";

    

        preg_match_all( $regex , $row->buffer_sk , $matches );

        

        $this->result = $matches;

        

        return $matches[2];

        

    }

    

}

dave at mixd dot net
22-Mar-2006 08:18


Use this to capture all JavaScript code that is between <script> tags.



Takes into account javascript that generates HTML. This one took a while, so I thought I'd share it.



$delimeter = 

'/<script[^>]*>((?:[^<>"\']+(?:"[^"]*"|\'[^\']*\')*)+)<\/script>/i';



Note: For some reason php.net is filtering out my escape characters... If it doesn't work make sure you escape all single quotes and the forward slash.

phpnet at sinful-music dot com
20-Feb-2006 12:53


Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!



<?php

function mime_extract_rfc2822_address($string)

{

        //rfc2822 token setup

        $crlf           = "(?:\r\n)";

        $wsp            = "[\t ]";

        $text           = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";

        $quoted_pair    = "(?:\\\\$text)";

        $fws            = "(?:(?:$wsp*$crlf)?$wsp+)";

        $ctext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" .

                          "!-'*-[\\]-\\x7F]";

        $comment        = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" .

                          "$fws?\\))";

        $cfws           = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))";

        //$cfws           = $fws; //an alternative to comments

        $atext          = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]";

        $atom           = "(?:$cfws?$atext+$cfws?)";

        $dot_atom_text  = "(?:$atext+(?:\\.$atext+)*)";

        $dot_atom       = "(?:$cfws?$dot_atom_text$cfws?)";

        $qtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]";

        $qcontent       = "(?:$qtext|$quoted_pair)";

        $quoted_string  = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)";

        $dtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]";

        $dcontent       = "(?:$dtext|$quoted_pair)";

        $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)";

        $domain         = "(?:$dot_atom|$domain_literal)";

        $local_part     = "(?:$dot_atom|$quoted_string)";

        $addr_spec      = "($local_part@$domain)";

        $display_name   = "(?:(?:$atom|$quoted_string)+)";

        $angle_addr     = "(?:$cfws?<$addr_spec>$cfws?)";

        $name_addr      = "(?:$display_name?$angle_addr)";

        $mailbox        = "(?:$name_addr|$addr_spec)";

        $mailbox_list   = "(?:(?:(?:(?<=:)|,)$mailbox)+)";

        $group          = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)";

        $address        = "(?:$mailbox|$group)";

        $address_list   = "(?:(?:^|,)$address)+";



        //output length of string (just so you see how f**king long it is)

        echo(strlen($address_list) . " ");



        //apply expression

        preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER);



        return $array;

};

?>

volkank at developera dot com
16-Feb-2006 11:23


Correct IP matching Pattern:



This is my new IP octet pattern seems to be correct:

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";



/*

25[0-5]    => 250-255

2[0-4]\d   => 200-249

[01]?\d\d  => 00-99,000-199

\d         => 0-9

*/



GRABBING multiple Valid IP addresses from string



<?

    $num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

    $test="127.0.0.112 10.0.0.2";

    preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);

    print_r($match);

      

?>



Single IP validation

<?

$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";

$ip_addr='009.111.111.100';

if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) echo "Wrong IP Address\\n";

echo $match[0];



?>

bgamrat at wirehopper dot com
12-Feb-2006 11:19


The double slashes in the following post should be replaced by single slashes.

bgamrat at wirehopper dot com
06-Feb-2006 07:54


I used these regular expressions to get the references from a page.   The function run_preg lists the references found.



$url = "http://test.com";

$text=@file_get_contents($url);

if ($text)

{

  $src_href_url=run_preg($text,

    "/(?:(?:src|href|url)\\s*[=\\(]\\s*[\\"'`])".

    "([\\+\\w:?=@&\\/#._;-]+)(?:[\\s\\"'`])/i");

  $windows=run_preg($text,

    "/(?:window.open\\s*\\(\\s*[\\w-]*\\s*[,]\\s*[\\"`'])".

    "([\\+\\w:?=@&\\/#._;-]*)(?:[\\"'`]\\s*)/i");

}



function run_preg($text,$pattern) {



   preg_match_all ($pattern, $text, $matches);



   if (count($matches)>0)

        if (count($matches[1])>0)

                foreach ($matches[1] as $k => $v)

                        echo "$k: $v\\n";



   return (is_array($matches)) ? $matches[1]:FALSE;

}



Thanks to http://us2.php.net/manual/en/function.preg-match.php#58505

for giving me a good starting point.



Hope others find this useful.  :)

mnc at u dot nu
02-Feb-2006 10:05


PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.

egingell at sisna dot com
31-Jan-2006 07:31


Try this for preg_match_all that takes an array of reg expers.



<?

// Emulates preg_match_all() but takes an array instead of a string.

// Returns an array containing all of the matches.

// The return array is an array containing the arrays normally returned by

//    preg_match_all() with the optional third parameter supplied.

function preg_search($ary, $subj) {

    $matched = array();

    if (is_array($ary)) {

        foreach ($ary as $v) {

            preg_match_all($v, $subj, $matched[]);

        }

    } else {

        preg_match_all($ary, $subj, $matched[]);

    }

    return $matched;

}

?>

18-Dec-2005 06:16


Two match all occurrences between and including any two HTML tags, here <tr> and </tr>



preg_match_all("/(\<[ \\n\\r\\t]{0,}tr[^>]*\>|\<[^>]*[\\n\\r\\t]{1,}tr[^>]*\>){1}

([^<]*<([^(\/>)]*(\/[^(t>)]){0,1}(\/t[^(r>)]){0,1})*>)*

(\<[ \\n\\r\\t]{0,}\/tr[^>]*\>|\<[^>]*[\\n\\r\\t]{1,}\/tr[^>]*\>){1}/i", $string, $Matches);

php at projectjj dot com
09-Dec-2005 12:43


Re: webmaster at swirldrop dot com



If you want to get a string with all the 'normal' characters, this may be better:



$clean = preg_replace('/\W+/', '', $dirty);



\W is the opposite of \w and will match any character that is not a letter or digit or the underscore character, plus it respects the current locale. Use [^0-9a-zA-Z]+ instead of \W if you need ASCII-only.

htp
07-Dec-2005 01:29


Just a quick note regarding the post by webmaster at swirldrop dot com.  The regex doesn't match alpha-numerics, as it doesn't actually match numerics, just alphas.  Might want to a add a 0-9 if that was the intend.

pablo dot seb at gmail dot com
16-Jun-2005 06:48


By assigning a name to a capturing group, you can easily reference it by name. (?P<name>group) captures the match of group into the backreference "name". You can reference the contents of the group with the numbered backreference or the named backreference 



<?php



preg_match_all('|(a)(?P<x>b)(?P<y>c)(d)|','abcdefgabcdefg',$sub);



echo $sub[2][0]; //b



echo '<br />';



echo $sub['y'][0]; //c



?>



Pablo from Salto, Uruguay

webmaster at m-bread dot com
07-Jun-2005 06:45


Looking at the function from rickyale at ig dot com dot br below getting URLs from an html file, I think this is slightly better:



function get_urls($string, $strict=true) {



   $types = array("href", "src", "url");

   while(list(,$type) = each($types)) {

       $innerT = $strict?'[a-z0-9:?=&@/._-]+?':'.+?';

       preg_match_all ("|$type\=([\"'`])(".$innerT.")\\1|i", $string, &$matches);

       $ret[$type] = $matches[2];

   }



return $ret;

};



This only gets urls in quotes "...", `...` or '...', but not mixed quotes like `..." (thanks to w w w's note on the 'pattern syntax' page). If you set the second parameter to false, then the function will give you any contents of attribute (so the function can be used to get other attributes, such as alt). To make it more strict, the '[a-z0-9:?=&@/._-]+?' can be replaced with a regular expression for a url.

webmaster at swirldrop dot com
07-Jun-2005 05:40


If you want to get al the text characters from a string, possibly entered by a user, and filter out all the non alpha-numeric characters (perhaps to make an ID to enter user-submitted details into a database record), then you can use the function below. It returns a string of only the alpha-numeric characters from the input string (all in lower case), with all other chracters removed:



<?php

function getText($string){

preg_match_all('/(?:([a-z]+)|.)/i', $string, $matches);

return strtolower(implode('', $matches[1]));

};//EoFn getText

?>



It took me quite a while tocome up with this regular expression. I hope it saves someone else that time.

20-Apr-2005 08:35


A little correction to my function below:



<?php

function urlhighlight($str) {

    preg_match_all("/http:\/\/?[^ ][^<]+/i",$str,$lnk);

    $size = sizeof($lnk[0]);

    $i = 0;

    while ($i < $size) {

        $len = strlen($lnk[0][$i]);

        if($len > 30) {

            $lnk_txt = substr($lnk[0][$i],0,30)."...";

        } else {

            $lnk_txt = $lnk[0][$i];    

        }

        $ahref = $lnk[0][$i];

        $str = str_replace($ahref,"<a href='$ahref' target='_blank'>$lnk_txt</a>",$str);

        $i++;

    }

    return $str;

}

?>



The error is in the preg_match_all("/http:\/\/?[^ ][^<]+/i",$str,$lnk); the [^<] was missing.

Dan Madsen
20-Apr-2005 06:25


I wrote a function, which takes urls from a string, or database output, highlights them, and shortens the links name if its above 30 characters.



Note: You'll have to use nl2br() function on the string before using it, because I didn't know how to check for LineFeed or CarrigeReturn in preg-style.



<?php

function urlhighlight($str) {

    preg_match_all("/http:\/\/?[^ ]+/i",$str,$lnk);

    $size = sizeof($lnk[0]);

    $i = 0;

    while ($i < $size) {

        $len = strlen($lnk[0][$i]);

        if($len > 30) {

            $lnk_txt = substr($lnk[0][$i],0,30)."...";

        } else {

            $lnk_txt = $lnk[0][$i];    

        }

        $ahref = $lnk[0][$i];

        $str = str_replace($ahref,"<a href='$ahref'>$lnk_txt</a>",$str);

        $i++;

    }

    return $str;

}

?>

Ex:

<?php

$str = "a lot of text with urls in it and alot of linebreaks";

$str = urlhighlight(nl2br($str));

?>

b2sing4u at naver dot com
08-Apr-2005 03:42


This function converts all HTML style decimal character code to hexadecimal code.

ex) Hi &#959; &#9674; Dec  ->  Hi &#x03BF; &#x25CA; Dec



function d2h($word) {

  $n = preg_match_all("/&#(\d+?);/", $word, $match, PREG_PATTERN_ORDER);

  for ($j = 0; $j < $n; $j++) {

    $word = str_replace($match[0][$j], sprintf("&#x%04X;", $match[1][$j]), $word);

  }

  return($word);

}



& This function converts all HTML style hexadecimal character code to decimal code.

ex) Hello &#x03BF; &#x25CA; Hex  ->  Hello &#959; &#9674; Hex



function h2d($word) {

  $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $word, $match, PREG_PATTERN_ORDER);

  for ($j = 0; $j < $n; $j++) {

    $word = str_replace($match[0][$j], sprintf("&#%u;", hexdec($match[1][$j])), $word);

  }

  return($word);

}

b2sing4u
07-Apr-2005 02:24


Character Code Conversion Example.



You can use following example to convert character code in HTML file.



First example converts Hexadecimal code to Decimal code.

  ex) Hello &#xFF; Hex -> Hello &#255; Hex



Second example converts Decimal code to Hexadecimal code.

  ex) Hi &#16; Dec -> Hi &#x0010; Dec



<?php



$h2d_get = fopen("h2d_get.htm", 'r');

$h2d_out = fopen("h2d_out.htm", 'w');



for ($i = 1; $i <= 1000; $i++)

{

  if (feof($h2d_get)) { break; }



  $line = fgets($h2d_get, 409600);

  $line = trim($line);

  if ($line == "99999999") { break; }



  $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $line, $match, PREG_PATTERN_ORDER);



  for ($j = 0; $j < $n; $j++)

  {

    $find = $match[0][$j];

    $code = hexdec($match[1][$j]);

    $push = sprintf("&#%u;", $code);

    $line = eregi_replace($find, $push, $line);

  }



  fwrite($h2d_out, $line);

  fwrite($h2d_out, "\r\n");

}



fclose($h2d_get);

fclose($h2d_out);



?>



<?php



$d2h_get = fopen("d2h_get.htm", 'r');

$d2h_out = fopen("d2h_out.htm", 'w');



for ($i = 1; $i <= 1000; $i++)

{

  if (feof($d2h_get)) { break; }



  $line = fgets($d2h_get, 409600);

  $line = trim($line);

  if ($line == "99999999") { break; }



  $n = preg_match_all("/&#(\d+?);/", $line, $match, PREG_PATTERN_ORDER);



  for ($j = 0; $j < $n; $j++)

  {

    $find = $match[0][$j];

    $code = $match[1][$j];

    $push = sprintf("&#x%04X;", $code);

    $line = eregi_replace($find, $push, $line);

  }



  fwrite($d2h_out, $line);

  fwrite($d2h_out, "\r\n");

}



fclose($d2h_get);

fclose($d2h_out);



?>

arias at elleondeoro dot com
15-Feb-2005 04:27


If you want to find all positions and his length, you can use the next function:



<?php

function preg_match_all_positions($pattern, $subject, &$count=null, $flags=0, $offset=0) {

  for ($count=0; preg_match($pattern, $subject, $match, $flags, $offset); $count++) {

    $positions[0][] = $pos = strpos($subject, $match[0], $offset);

    $positions[1][] = $len = strlen($match[0]);

    $offset = $pos+$len;

  }

  return $positions;

}

?>

mpbweb at mbourque dot com
02-Feb-2005 10:41


Here is a handy function I wrote that will check for broken links on the supplied url.



function dead_links($url) {



// mixed link_checker( $url )

// Returns:

//    FALSE if no broken links are found.

//    ARRAY containing broken links if any are found.



   ob_start();

      if( !readfile($url) ) return FALSE;

      $body = ob_get_contents();

   ob_end_clean();



   $pathparts = pathinfo($url);



   $urlpattern = "/<a[^>]+href=\"([^\"]+)/i";

   preg_match_all($urlpattern,$body,$matches);



   foreach( $matches[1] as $link) {



      if( strpos($link,"http://") === FALSE ) { // Deal with relative paths

         $link = $pathparts['dirname'] . "/" . $link;

      }



      $fp = @fopen("$link", "r");

      fclose($fp);

      if (!$fp) {

         $linkArray[] = $link;

      }



   }



   return (is_array($linkArray) ) ? $linkArray : FALSE;

}



Regards,



Michael Bourque

MCLD
20-Jan-2005 02:35


Here's a nice easy use for preg_match_all. I have data files in comma-separated-values format, with all the data enclosed in quote marks. To convert one line of such a data file into an array:



function quotedCsvLineToArray($l)

{

  preg_match_all('/(?<=,|\A)("(.*?)")?(?=,|\Z)/',$l, $matches, PREG_PATTERN_ORDER);

  return $matches[2];

}



hope it helps

dan

hex6ng at yahoo dot com
02-Jul-2004 03:04


This is a much more efficient version of the same function posted in ereg_replace() discussion by hdn, who is the same person as hex6ng.  I didn't include activating urls without http:// protocol identifier because there are many xxx.xxx patterns that are not urls.



function html_activate_urls($str)

{

    // lift all links, images and image maps

    $url_tags = array (

                     "'<a[^>]*>.*?</a>'si",

                     "'<map[^>]*>.*?</map>'si",

                     "'<script[^>]*>.*?</script>'si",

                     "'<style[^>]*>.*?</style>'si",

                     "'<[^>]+>'si"

                      );

    foreach($url_tags as $url_tag)

    {

        preg_match_all($url_tag, $str, $matches, PREG_SET_ORDER);

        foreach($matches as $match)

        {

            $key = "<" . md5($match[0]) . ">";

            $search[] = $key;

            $replace[] = $match[0];

        }

    }



    $str = str_replace($replace, $search, $str);



    // indicate where urls end if they have these trailing special chars

    $sentinals = array("/&(quot|#34);/i",        // Replace html entities

                       "/&(lt|#60);/i",

                       "/&(gt|#62);/i",

                       "/&(nbsp|#160);/i",

                       "/&(iexcl|#161);/i",

                       "/&(cent|#162);/i",

                       "/&(pound|#163);/i",

                       "/&(copy|#169);/i");



    $str = preg_replace($sentinals, "<marker>\\0", $str);



    // URL into links

    $str = 

preg_replace( "|\w{3,10}://[\w\.\-_]+(:\d+)?[^\s\"\'<>\(\)\{\}]*|",  

                   "<a href=\"\\0\">\\0</a>", $str ); 



    $str = str_replace("<marker>", '', $str);

    return str_replace($search, $replace, $str);

}



-hdn

vb_user at yahoo dot com
21-Apr-2004 09:00


If you want to extract the list of php functions in one of your library (ie, includes) for documentation or any purpose use the below:



$filename = 'library.php';

$fp = fopen($filename,'r');

if ($fp !== false) {

    $str = fread($fp, filesize ($filename));

    $count = preg_match_all ("|function[ ]+(.*)[\(](.*)[\)]|U", $str, $out, PREG_PATTERN_ORDER);



    for ($i=0; $i<$count; $i++) {

        if (!eregi('array',$out[1][$i])) {

            echo '#T='.$out[1][$i]."\n";

            echo $out[1][$i].'('.$out[2][$i].')'."\n\n";

        }

    }

}

fabriceb at gmx dot net
05-Mar-2004 06:55


If you just want to find out how many times a string contains another simple string, don't use preg_match_all like I did before I fould the substr_count function.



Use

<?php

$nrMatches = substr_count ('foobarbar', 'bar');

?>

instead. Hope this helps some other people like me who like to think too complicated :-)