PHP: preg_replace

Описание

mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )

Выполняет поиск в строке subject совпадений с шаблоном pattern и заменяет их на replacement. В случае, если параметр limit указан, будет произведена замена limit вхождений шаблона; в случае, если limit опущен либо равняется -1, будут заменены все вхождения шаблона.

Replacement может содержать ссылки вида \\n либо (начиная с PHP 4.0.4) $n, причем последний вариант предпочтительней. Каждая такая ссылка, будет заменена на подстроку, соответствующую n'нной заключенной в круглые скобки подмаске. n может принимать значения от 0 до 99, причем ссылка \\0 (либо $0) соответствует вхождению всего шаблона. Подмаски нумеруются слева направо, начиная с единицы.

При использовании замены по шаблону с использованием ссылок на подмаски может возникнуть ситуация, когда непосредственно за маской следует цифра. В таком случае нотация вида \\n приводит к ошибке: ссылка на первую подмаску, за которой следует цифра 1, запишется как \\11, что будет интерпретировано как ссылка на одиннадцатую подмаску. Это недоразумение можно устранить, если воспользоваться конструкцией \${1}1, указывающей на изолированную ссылку на первую подмаску, и следующую за ней цифру 1.

Пример 1. Использование подмасок, за которыми следует цифра

<?php $string = "April 15, 2003"; $pattern = "/(\w+) (\d+), (\d+)/i"; $replacement = "\${1}1,\$3"; echo preg_replace($pattern, $replacement, $string); ?>
Результатом работы этого примера будет:
April1,2003

Если во время выполнения функции были обнаружены совпадения с шаблоном, будет возвращено измененное значение subject, в противном случае будет возвращен исходный текст subject.

Первые три параметра функции preg_replace() могут быть одномерными массивами. В случае, если массив использует ключи, при обработке массива они будут взяты в том порядке, в котором они расположены в массиве. Указание ключей в массиве для pattern и replacement не является обязательным. Если вы все же решили использовать индексы, для сопоставления шаблонов и строк, участвующих в замене, используйте функцию ksort() для каждого из массивов.

Пример 2. Использование массивов с числовыми индексами в качестве аргументов функции preg_replace()

<?php $string = "The quick brown fox jumped over the lazy dog."; $patterns[0] = "/quick/"; $patterns[1] = "/brown/"; $patterns[2] = "/fox/"; $replacements[2] = "bear"; $replacements[1] = "black"; $replacements[0] = "slow"; echo preg_replace($patterns, $replacements, $string); ?>
Результат:
The bear black slow jumped over the lazy dog.
Используя ksort(), получаем желаемый результат:

<?php ksort($patterns); ksort($replacements); echo preg_replace($patterns, $replacements, $string); ?>
Результат:
The slow black bear jumped over the lazy dog.

В случае, если параметр subject является массивом, поиск и замена по шаблону производятся для каждого из его элементов. Возвращаемый результат также будет массивом.

В случае, если параметры pattern и replacement являются массивами, preg_replace() поочередно извлекает из обоих массивов по паре элементов и использует их для операции поиска и замены. Если массив replacement содержит больше элементов, чем pattern, вместо недостающих элементов для замены будут взяты пустые строки. В случае, если pattern является массивом, а replacement - строкой, по каждому элементу массива pattern будет осущесвтлен поиск и замена на pattern (шаблоном будут поочередно все элементы массива, в то время как строка замены остается фиксированной). Вариант, когда pattern является строкой, а replacement - массивом, не имеет смысла.

Модификатор /e меняет поведение функции preg_replace() таким образом, что параметр replacement после выполнения необходимых подстановок интерпретируется как PHP-код и только после этого используется для замены. Используя данный модификатор, будьте внимательны: параметр replacement должен содержать корректный PHP-код, в противном случае в строке, содержащей вызов функции preg_replace(), возникнет ошибка синтаксиса.

Пример 3. Замена по нескольким шаблонам

<?php $patterns = array ("/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/", "/^\s*{(\w+)}\s*=/"); $replace = array ("\\3/\\4/\\1\\2", "$\\1 ="); echo preg_replace($patterns, $replace, "{startDate} = 1999-5-27"); ?>
Этот пример выведет:
$startDate = 5/27/1999

Пример 4. Использование модификатора /e

<?php preg_replace("/(<\/?)(\w+)([^>]*>)/e", "'\\1'.strtoupper('\\2').'\\3'", $html_body); ?>
Преобразует все HTML-теги к верхнему регистру

Пример 5. Конвертор HTML в текст

<?php // $document на выходе должен содержать HTML-документ. // Необходимо удалить все HTML-теги, секции javascript, // пробельные символы. Также необходимо заменить некоторые // HTML-сущности на их эквивалент. $search = array ("'<script[^>]*?>.*?</script>'si", // Вырезает javaScript "'<[\/\!]*?[^<>]*?>'si", // Вырезает HTML-теги "'([\r\n])[\s]+'", // Вырезает пробельные символы "'&(quot|#34);'i", // Заменяет HTML-сущности "'&(amp|#38);'i", "'&(lt|#60);'i", "'&(gt|#62);'i", "'&(nbsp|#160);'i", "'&(iexcl|#161);'i", "'&(cent|#162);'i", "'&(pound|#163);'i", "'&(copy|#169);'i", "'&#(\d+);'e"); // интерпретировать как php-код $replace = array ("", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)"); $text = preg_replace($search, $replace, $document); ?>

Замечание: Параметр limit доступен в PHP 4.0.1pl2 и выше.

Смотрите также preg_match(), preg_match_all(), и preg_split().

preg_replace

kurt at yachthub dot com
08-Jul-2006 07:20


Fix up for most common bad punctuation round commas and fullstops, remove white space, make the first letter of a sentence uppercase and replace dubious characters like &, ", ', etc. with special html characters.



<?

$string="            harry's house.4. 2m long 3.5m wide.63\" but great .   seating , for 7.pjljk.cost is $12, 00.00.Good buy .0.0. 1.1m\n";



echo "<pre>$string</pre><p>";



$pat[0] = '/\./';

$pat[1] = '/ \./';

$pat[2] = '/\,/';

$pat[3] = '/ \,/';

$pat[4] = '/\n /';

$pat[5] = '/ +/';



$repl[0] = '. ';

$repl[1] = '.';

$repl[2] = ', ';

$repl[3] = ', ';

$repl[4] = '\n';

$repl[5] = ' ';



$string=split("\. ",trim(ucfirst(stripslashes(htmlspecialchars(preg_replace($pat, $repl, $string),ENT_QUOTES)))));

foreach ($string as $key=>$word) { 

$string[$key] = ucfirst($word); 

} 

$string = implode ('. ', $string);



$i=0;

while($i < 10){

$pat[$i] = '/'.$i.'\. /';

$repl[$i] = ''.$i.'.';

$i++;

}

while($i < 36){

$b=$i+55;

$pat[$i] = '/\.'.chr($b).'/';

$repl[$i] = '. '.chr($b).'';

$i++;

}

while($i < 46){

$b=$i-36;

$pat[$i] = '/'.$b.'\, /';

$repl[$i] = ''.$b.',';

$i++;

}



$string = preg_replace($pat, $repl, $string);



echo $string;

?>

www.humer.biz
06-Jul-2006 12:09


@Graham: Your function from march, 16th works only error free, if tag (if it is only one) in your source is closed.



So I came along to add a pseudo-Tag around source and everything runs well ;)



function strip_styles($source=NULL)

{

  # and pseudo-Tag

  $source = '<parse>'.$source.'</parse>';



  [...] rest of function



  # and return it this way:

  return (str_replace(array('<parse>','</parse>'),"",$source));

}

Sune Rievers
24-May-2006 10:58


Updated version of the link script, since the other version didn't work with links in beginning of line, links without http:// and emails. Oh, and a bf2:// detection too for all you gamers ;)



function make_links_blank($text)

{

  return  preg_replace(

     array(

       '/(?(?=<a[^>]*>.+<\/a>)

             (?:<a[^>]*>.+<\/a>)

             |

             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)

         )/iex',

       '/<a([^>]*)target="?[^"\']+"?/i',

       '/<a([^>]+)>/i',

       '/(^|\s)(www.[^<> \n\r]+)/iex',

       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)

       (\\.[A-Za-z0-9-]+)*)/iex'

       ),

     array(

       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",

       '<a\\1',

       '<a\\1 target="_blank">',

        "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",

        "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"

       ),

       $text

   );

}

klemens at ull dot at
16-May-2006 02:24


See as well the excellent tutorial at http://www.tote-taste.de/X-Project/regex/index.php



;-) Klemens

robvdl at gmail dot com
21-Apr-2006 05:15


For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as &#8217;.



Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.



It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.



$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);

$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);

heppa(at)web(dot)de
20-Apr-2006 08:37


I just wanted to give an example for some people that have the problem, that their match is taking away too much of the string.



I wanted to have a function that extracts only wanted parameters out of a http query string, and they had to be flexible, eg 'updateItem=1' should be replaced, as well as 'updateCategory=1', but i sometimes ended up having too much replaced from the query.



example:



my query string: 'updateItem=1&itemID=14'



ended up in a query string like this: '4' , which was not really covering the plan ;)



i was using this regexp: 



preg_replace('/&?update.*=1&?/','',$query_string);



i discovered, that preg_replace matches the longest possible string, which means that it replaces everything from the first u up to the 1 after itemID=



I assumed, that it would take the shortest possible match.

Ritter
18-Apr-2006 02:08


for those of you with multiline woes like I was having, try:



$str = preg_replace('/<tag[^>](.*)>(.*)<\/tag>/ims','<!-- edited -->', $str);

Eric
09-Apr-2006 11:54


Here recently I needed a way to replace links (<a href="blah.com/blah.php">Blah</a>) with their anchor text, in this case Blah. It might seem simple enough for some..or most, but at the benefit of helping others:



<?php



$value = '<a href="http://www.domain.com/123.html">123</a>';



echo preg_replace('/<a href="(.*?)">(.*?)<\\/a>/i', '$2', $value);



//Output

// 123



?>

sesha_srinivas at yahoo dot com
07-Apr-2006 01:13


If you have a form element displaying the amounts using "$" and ",". Before posting it to the db you can use the following:



$search = array('/,/','/\$/');



$replace = array('','');



$data['amount_limit'] = preg_replace($search,'',$data['amount_limit']);

ciprian dot amariei Mtaiil gmail * com
05-Apr-2006 10:21


I found some situations that my function bellow doesn't

perform as expected. Here is the new version.



<?php

function make_links_blank( $text )

{

 return  preg_replace(

      array(

        '/(?(?=<a[^>]*>.+<\/a>)

              (?:<a[^>]*>.+<\/a>)

              |

              ([^="\'])((?:https?|ftp):\/\/[^<> \n\r]+)

          )/iex',

        '/<a([^>]*)target="?[^"\']+"?/i',

        '/<a([^>]+)>/i'

        ),

      array(

        "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",

        '<a\\1',

        '<a\\1 target="_blank">'

        ),

        $text

    );

}



?>



This function replaces links (http(s)://, ftp://) with respective html anchor tag, and also makes all anchors open in a new window.

ae at instinctive dot de
28-Mar-2006 07:40


Something innovative for a change ;-) For a news system, I have a special format for links:



"Go to the [Blender3D Homepage|http://www.blender3d.org] for more Details"



To get this into a link, use:



$new = preg_replace('/\[(.*?)\|(.*?)\]/', '<a href="$2" target="_blank">$1</a>', $new);

c_stewart0a at yahoo dot com
17-Mar-2006 02:35


In response to elaineseery at hotmail dot com



[quote]if you're new to this function, and getting an error like  'delimiter must not alphanumeric backslash ...[/quote]



Note that if you use arrays for search and replace then you will want to quote your searching expression with / or you will get this error.



However, if you use a single string to search and replace then you will not recieve this error if you do not quote your regular expression in /

Graham Dawson <graham at imdanet dot com>
16-Mar-2006 02:46


I said there was a better way. There is!



The regexp is essentially the same but now I deal with problems that it couldn't handle, such as urls, which tended to screw things up, and the odd placement of a : or ; in the body text, by using functions. This makes it easier to expand to take account of all the things I know I've not taken account of. But here it is in its essential glory. Or mediocrity. Take your pick.



<?php



define('PARSER_ALLOWED_STYLES_',

'text-align,font-family,font-size,text-decoration');



function strip_styles($source=NULL) {

  $exceptions = str_replace(',', '|', @constant('PARSER_ALLOWED_STYLES_'));



  /* First we want to fix anything that might potentially break the styler stripper, sow e try and replace

   * in-text instances of : with its html entity replacement.

   */



  function Replacer($text) {

    $check = array (

        '@:@s',

    );

    $replace = array(

        '&#58;',

    );



    return preg_replace($check, $replace, $text[0]);

  }



  $source = preg_replace_callback('@>(.*)<@Us', 'Replacer', $source);



  $regexp = 



'@([^;"]+)?(?<!'. $exceptions. ')(?<!\>\w):(?!\/\/(.+?)\/|<|>)((.*?)[^;"]+)(;)?@is';



  $source = preg_replace($regexp, '', $source);



  $source = preg_replace('@[a-z]*=""@is', '', $source);



  return $source;

}



?>

rybasso
16-Mar-2006 01:33


"Document contains no data" message in FF and 'This page could not be found' in IE occures when you pass too long <i>subject</i> string to preg_replace() with default <i>limit</i>. 



Increment the limit to be sure it's larger than a subject lenght.

Ciprian Amariei
15-Mar-2006 02:50


Here is a function that replaces the links (http(s)://, ftp://) with respective html anchor, and also makes all anchors open in a new window.



function make_links_blank( $text )

{

 

 return  preg_replace( array(

                "/[^\"'=]((http|ftp|https):\/\/[^\s\"']+)/i",

                "/<a([^>]*)target=\"?[^\"']+\"?/i",

                "/<a([^>]+)>/i"

        ),

          array(

                "<a href=\"\\1\">\\1</a>",

                "<a\\1",

                "<a\\1 target=\"_blank\" >"

            ),

        $text

        );

}

felipensp at gmail dot com
12-Mar-2006 09:02


Sorry, I don't know English.



Replacing letters of badword for a definite character.

View example:



<?php



function censured($string, $aBadWords, $sChrReplace) {



    foreach ($aBadWords as $key => $word) {



        // Regexp for case-insensitive and use the functions

        $aBadWords[$key] = "/({$word})/ie";



    }



    // to substitue badwords for definite character

    return preg_replace($aBadWords,

                        "str_repeat('{$sChrReplace}', strlen('\\1'))",

                        $string

                        );



}



// To show modifications

print censured('The nick of my friends are rand, v1d4l0k4, P7rk, ferows.',

               array('RAND', 'V1D4L0K4', 'P7RK', 'FEROWS'),

               '*'

               );

    

?>

Graham Dawson graham_at_imdanet_dot_com
07-Mar-2006 01:32


Inspired by the query-string cleaner from greenthumb at 4point-webdesign dot com and istvan dot csiszar at weblab dot hu. This little bit of code cleans up any "style" attributes in your tags, leaving behind only styles that you have specifically allowed. Also conveniently strips out nonsense styles. I've not fully tested it yet so I'm not sure if it'll handle features like url(), but that shouldn't be a difficulty.





<?php





/* The string would normally be a form-submitted html file or text string */





$string = '<span style="font-family:arial; font-size:20pt; text-decoration:underline; sausage:bueberry;" width="200">Hello there</span> This is some <div style="display:inline;">test text</div>';





/* Array of styles to allow. */


$except = array('font-family', 'text-decoration');


$allow = implode($except, '|');





/* The monster beast regexp. I was up all night trying to figure this one out. */





$regexp = '@([^;"]+)?(?<!'.$allow.'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';


print str_replace('<', '<', $regexp).'<br/><br/>';





$out = preg_replace($regexp, '', $string);





/* Now lets get rid of any unwanted empty style attributes */





$out = preg_replace('@[a-z]*=""@is', '', $out);





print $out;





?>





This should produce the following:





<span style="font-family:arial; text-decoration:underline;" width="200">Hello there</span> This is some <div >test text</div>





Now, I'm a relative newbie at this so I'm sure there's a better way to do it. There's *always* a better way.

elaineseery at hotmail dot com
15-Feb-2006 06:44


if you're new to this function, and getting an error like 

'delimiter must not alphanumeric backslash ...



note that whatever is in $pattern (and only $pattern, not $string, or $replacement) must be enclosed by '/   /' (note the forward slashes)



e.g. 

$pattern = '/and/';

$replacement = 'sandy';

$string = 'me and mine';



generates 'me sandy mine'



seems to be obvious to everyone else, but took me a while to figure out!!

jsirovic at gmale dot com
07-Feb-2006 09:23


If the lack of &$count is aggravating in PHP 4.x, try this:



$replaces = 0;



$return .= preg_replace('/(\b' . $substr . ')/ie', '"<$tag>$1<$end_tag>" . (substr($replaces++,0,0))', $s2, $limit);

no-spam@idiot^org^ru
05-Feb-2006 12:21


decodes ie`s escape() result



<?



function unicode_unescape(&$var, $convert_to_cp1251 = false){

    $var = preg_replace(

        '#%u([\da-fA-F]{4})#mse',

        $convert_to_cp1251 ? '@iconv("utf-16","windows-1251",pack("H*","\1"))' : 'pack("H*","\1")',

        $var

    );

}



//



$str = 'to %u043B%u043E%u043F%u0430%u0442%u0430 or not to %u043B%u043E%u043F%u0430%u0442%u0430';



unicode_unescape($str, true);



echo $str;



?>

leandro[--]ico[at]gm[--]ail[dot]com
04-Feb-2006 09:40


I've found out a really odd error.



When I try to use the 'empty' function in the replacement string (when using the 'e' modifier, of course) the regexp interpreter get stucked at that point.



An examplo of this failure:



<?php

echo $test = preg_replace( "/(bla)/e", "empty(123)", "bla bla ble" );



# it should print something like:

# "1 1 ble"

?>



Very odd, huh?

03-Feb-2006 08:00


fairly useful script to replace normal html entities with ordinal-value entities.  Useful for writing to xml documents where entities aren't defined.

<?php

$p='#(\&[\w]+;)#e';

$r="'&#'.ord(html_entity_decode('$1')).';'";

$text=preg_replace($p,$r,$_POST['data']);

?>

Rebort
02-Feb-2006 11:51


Following up on pietjeprik at gmail dot com's great string to parse [url] bbcode: 

<?php

$url = '[url=http://www.foo.org]The link[/url]';

$text = preg_replace("/\[url=(\W?)(.*?)(\W?)\](.*?)\[\/url\]/", '<a href="$2">$4</a>', $url);

?>



This allows for the user to enter variations: 



[url=http://www.foo.org]The link[/url]

[url="http://www.foo.org"]The link[/url]

[url='http://www.foo.org']The link[/url]



or even



[url=#http://www.foo.org#]The link[/url]

[url=!http://www.foo.org!]The link[/url]

31-Jan-2006 10:23


Uh-oh. When I looked at the text in the preview, I had to double the number of backslashes to make it look right. 

I'll try again with my original text:



$full_text = preg_replace('/\[p=(\d+)\]/e',

  "\"<a href=\\\"./test.php?person=$1\\\">\"

    .get_name($1).\"</a>\"",

   $short_text);



I hope that it comes out correctly this time :-)

leif at solumslekt dot org
31-Jan-2006 08:24


I've found a use for preg_replace. If you've got eg. a database with persons assiciated with numbers, you may want to input links in a kind of shorthand, like [p=12345], and have it expanded to a full url with a name in it.



This is my solution:



$expanded_text = preg_replace('/\\[p=(\d+)\\]/e',

    "\\"<a href=\\\\\\"./test.php?person=$1\\\\\\">\\".get_name($1).\\"</a&>\\"",

        $short_text);



It took me some time to work out the proper number of quotes and backslashes.



regards, Leif.

SG_01
19-Jan-2006 04:43


Re: wcc at techmonkeys dot org



You could put this in 1 replace for faster execution as well:



<?php



/*

 * Removes all blank lines from a string.

 */

function removeEmptyLines($string)

{

   return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);

}



?>

05-Jan-2006 02:09


First, I have no idea about regexp, all I did has been through trial and error, 

I wrote this function which tries to clean crappy ms word html, I use it to clean user pasted code to online wysiwyg online editors from ms word.

There�s a huge space for improvement, I post it here because after searching I could not find any pure php solution, the best alternative however, is tidy, but for those of us who are still using PHP 4 and do not have access to the server, this could be an alternative, use it under your own risk... once again, it was a quickie and I know there can be much better ways to do this: 



function decraper($htm, $delstyles=false) {

    $commoncrap = array('&quot;'

    ,'font-weight: normal;'

    ,'font-style: normal;'

    ,'line-height: normal;'

    ,'font-size-adjust: none;'

    ,'font-stretch: normal;'); 

    $replace = array("'");

    $htm = str_replace($commoncrap, $replace, $htm);

     $pat = array();

    $rep = array();

    $pat[0] = '/(<table\s.*)(width=)(\d+%)(\D)/i';

    $pat[1] = '/(<td\s.*)(width=)(\d+%)(\D)/i';

    $pat[2] = '/(<th\s.*)(width=)(\d+%)(\D)/i';

    $pat[3] = '/<td( colspan="[0-9]+")?( rowspan="[0-9]+")?

( width="[0-9]+")?( height="[0-9]+")?.*?>/i';

    $pat[4] = '/<tr.*?>/i';

    $pat[5]=

'/<\/st1:address>(<\/st1:\w*>)?

<\/p>[\n\r\s]*<p[\s\w="\']*>/i';

    $pat[6] = '/<o:p.*?>/i';

    $pat[7] = '/<\/o:p>/i';

    $pat[8] = '/<o:SmartTagType[^>]*>/i';

    $pat[9] = '/<st1:[\w\s"=]*>/i';

    $pat[10] = '/<\/st1:\w*>/i';

    $pat[11] = '/<p class="MsoNormal"[^>]*>(.*?)<\/p>/i';

    $pat[12] = '/ style="margin-top: 0cm;"/i';

    $pat[13] = '/<(\w[^>]*) class=([^ |>]*)([^>]*)/i';

    $pat[14] = '/<ul(.*?)>/i';

    $pat[15] = '/<ol(.*?)>/i';

    $pat[17] = '/<br \/>&nbsp;<br \/>/i';

    $pat[18] = '/&nbsp;<br \/>/i';

    $pat[19] = '/<!-.*?>/';

    $pat[20] = '/\s*style=(""|\'\')/';

    $pat[21] = '/ style=[\'"]tab-interval:[^\'"]*[\'"]/i';

    $pat[22] = '/behavior:[^;\'"]*;*(\n|\r)*/i';

    $pat[23] = '/mso-[^:]*:"[^"]*";/i';

    $pat[24] = '/mso-[^;\'"]*;*(\n|\r)*/i';

    $pat[25] = '/\s*font-family:[^;"]*;?/i';

    $pat[26] = '/margin[^"\';]*;?/i';

    $pat[27] = '/text-indent[^"\';]*;?/i';

    $pat[28] = '/tab-stops:[^\'";]*;?/i';

    $pat[29] = '/border-color: *([^;\'"]*)/i';

    $pat[30] = '/border-collapse: *([^;\'"]*)/i';

    $pat[31] = '/page-break-before: *([^;\'"]*)/i';

    $pat[32] = '/font-variant: *([^;\'"]*)/i';

    $pat[33] = '/<span [^>]*><br \/><\/span><br \/>/i';

    $pat[34] = '/" "/';

    $pat[35] = '/[\t\r\n]/';

    $pat[36] = '/\s\s/s';

    $pat[37] = '/ style=""/';

    $pat[38] = '/<span>(.*?)<\/span>/i';

//empty (no attribs) spans

    $pat[39] = '/<span>(.*?)<\/span>/i';

//twice, nested spans

    $pat[40] = '/(;\s|\s;)/';

    $pat[41] = '/;;/';

    $pat[42] = '/";/';

    $pat[43] = '/<li(.*?)>/i';

    $pat[44] = 

'/(<\/b><b>|<\/i><i>|<\/em><em>|

<\/u><u>|<\/strong><strong>)/i';

    $rep[0] = '$1$2"$3"$4';

    $rep[1] = '$1$2"$3"$4';

    $rep[2] = '$1$2"$3"$4';

    $rep[3] = '<td$1$2$3$4>';

    $rep[4] = '<tr>';

    $rep[5] = '<br />';

    $rep[6] = '';

    $rep[7] = '<br />';

    $rep[8] = '';

    $rep[9] = '';

    $rep[10] = '';

    $rep[11] = '$1<br />';

    $rep[12] = '';

    $rep[13] = '<$1$3';

    $rep[14] = '<ul>';

    $rep[15] = '<ol>';

    $rep[17] = '<br />';

    $rep[18] = '<br />';

    $rep[19] = '';

    $rep[20] = '';

    $rep[21] = '';

    $rep[22] = '';

    $rep[23] = '';

    $rep[24] = '';

    $rep[25] = '';

    $rep[26] = '';

    $rep[27] = '';

    $rep[28] = '';

    $rep[29] = '';

    $rep[30] = '';

    $rep[31] = '';

    $rep[32] = '';

    $rep[33] = '<br />';

    $rep[34] = '""';

    $rep[35] = '';

    $rep[36] = '';

    $rep[37] = '';

    $rep[38] = '$1';

    $rep[39] = '$1';

    $rep[40] = ';';

    $rep[41] = ';';

    $rep[42] = '"';

    $rep[43] = '<li>';

    $rep[44] = '';

    if($delstyles===true){

        $pat[50] = '/ style=".*?"/';

        $rep[50] = '';

    }

    ksort($pat);

    ksort($rep);

    return $htm;

}



Hope it helps, critics are more than welcome.

kyle at vivahate dot com
22-Dec-2005 12:08


Here is a regular expression to "slashdotify" html links.  This has worked well for me, but if anyone spots errors, feel free to make corrections.



<?php

$url = '<a attr="garbage" href="http://us3.php.net/preg_replace">preg_replace - php.net</a>';

$url = preg_replace( '/<.*href="?(.*:\/\/)?([^ \/]*)([^ >"]*)"?[^>]*>(.*)(<\/a>)/', '<a href="$1$2$3">$4</a> [$2]', $url );

?>



Will output:



<a href="http://us3.php.net/preg_replace">preg_replace - php.net</a> [us3.php.net]

istvan dot csiszar at weblab dot hu
21-Dec-2005 01:53


This is an addition to the previously sent removeEvilTags function. If you don't want to remove the style tag entirely, just certain style attributes within that, then you might find this piece of code useful:



<?php



function removeEvilStyles($tagSource)

{

   // this will leave everything else, but:

    $evilStyles = array('font', 'font-family', 'font-face', 'font-size', 'font-size-adjust', 'font-stretch', 'font-variant');



    $find = array();

    $replace = array();

    

    foreach ($evilStyles as $v)

    {

        $find[]    = "/$v:.*?;/";

        $replace[] = '';

    }

    

    return preg_replace($find, $replace, $tagSource);

}



function removeEvilTags($source)

{

    $allowedTags = '<h1><h2><h3><h4><h5><a><img><label>'.

        '<p><br><span><sup><sub><ul><li><ol>'.

        '<table><tr><td><th><tbody><div><hr><em><b><i>';

    $source = strip_tags(stripslashes($source), $allowedTags);

    return trim(preg_replace('/<(.*?)>/ie', "'<'.removeEvilStyles('\\1').'>'", $source));

}



?>

triphere
17-Dec-2005 09:13


to remove Bulletin Board Code (remove bbcode)



$body = preg_replace("[\[(.*?)\]]", "", $body);

jcheger at acytec dot com
09-Dec-2005 12:16


Escaping quotes may be very tricky. Magic quotes and preg_quote are not protected against double escaping. This means that an escaped quote will get a double backslash, or even more. preg_quote ("I\'m using regex") will return "I\\'m using regex".



The following example escapes only unescaped single quotes:



<?php

$a = "I'm using regex";

$b = "I\'m using regex";



$patt = "/(?<!\\\)\'/";

$repl = "\\'";



print "a:  ".preg_replace ($patt, $repl, $a)."\n"; 

print "b:  ".preg_replace ($patt, $repl, $b)."\n"; 

?>



and prints:

a:  I\'m using regex

b:  I\'m using regex



Remark: matching a backslashe require a triple backslash (\\\).

urbanheroes {at} gmail {dot} com
15-Aug-2005 01:00


Here are two functions to trim a string down to a certain size. 



"wordLimit" trims a string down to a certain number of words, and adds an ellipsis after the last word, or returns the string if the limit is larger than the number of words in the string.



"stringLimit" trims a string down to a certain number of characters, and adds an ellipsis after the last word, without truncating any words in the middle (it will instead leave it out), or returns the string if the limit is larger than the string size. The length of a string will INCLUDE the length of the ellipsis.



<?php



function wordLimit($string, $length = 50, $ellipsis = '...') {

   return count($words = preg_split('/\s+/', ltrim($string), $length + 1)) > $length ?

       rtrim(substr($string, 0, strlen($string) - strlen(end($words)))) . $ellipsis :

       $string;

}



function stringLimit($string, $length = 50, $ellipsis = '...') {

   return strlen($fragment = substr($string, 0, $length + 1 - strlen($ellipsis))) < strlen($string) + 1 ? 

       preg_replace('/\s*\S*$/', '', $fragment) . $ellipsis : $string;

}



echo wordLimit('   You can limit a string to only so many words.', 6);

// Output: "You can limit a string to..."

echo stringLimit('Or you can limit a string to a certain amount of characters.', 32);

// Output: "Or you can limit a string to..."



?>

avizion at relay dot dk
24-Apr-2005 12:04


Just a note for all FreeBSD users wondering why this function is not present after installing php / mod_php (4 and 5) from ports.



Remember to install:



/usr/ports/devel/php4-pcre (or 5 for -- 5 ;)



That's all... enjoy - and save 30 mins. like I could have used :D

jhm at cotren dot net
18-Feb-2005 02:04


It took me a while to figure this one out, but here is a nice way to use preg_replace to convert a hex encoded string back to clear text



<?php

    $text = "PHP rocks!";

    $encoded = preg_replace(

           "'(.)'e"

          ,"dechex(ord('\\1'))"

          ,$text

    );

    print "ENCODED: $encoded\n";

?>

ENCODED: 50485020726f636b7321

<?php

    print "DECODED: ".preg_replace(

       "'([\S,\d]{2})'e"

      ,"chr(hexdec('\\1'))"

      ,$encoded)."\n";

?>

DECODED: PHP rocks!

gbaatard at iinet dot net dot au
14-Feb-2005 09:56


on the topic of implementing forum code ([b][/b] to <b></b> etc), i found this worked well...



<?php 

$body = preg_replace('/\[([biu])\]/i', '<\\1>', $body);

$body = preg_replace('/\[\/([biu])\]/i', '</\\1>', $body);

?>



First line replaces [b] [B] [i] [I] [u] [U] with the appropriate html tags(<b>, <i>, <u>)



Second one does the same for closing tags...



For urls, I use...



<?php 

$body = preg_replace('/\s(\w+:\/\/)(\S+)/', ' <a href="\\1\\2" target="_blank">\\1\\2</a>', $body);

?>



and for urls starting with www., i use...



<?php 

$body = preg_replace('/\s(www\.)(\S+)/', ' <a href="http://\\1\\2" target="_blank">\\1\\2</a>', $body);

?>



Pop all these lines into a function that receives and returns the text you want 'forum coded' and away you go:)

tash at quakersnet dot com
30-Jan-2005 04:25


A better way for link & email conversaion, i think. :)



<?php

function change_string($str)

    {

     $str = trim($str);

     $str = htmlspecialchars($str);

     $str = preg_replace('#(.*)\@(.*)\.(.*)#','<a href="mailto:\\1@\\2.\\3">Send email</a>',$str);

     $str = preg_replace('=([^\s]*)(www.)([^\s]*)=','<a href="http://\\2\\3" target=\'_new\'>\\2\\3</a>',$str);

     return $str;

    }

?>

jw-php at valleyfree dot com
25-Jan-2005 08:28


note the that if you want to replace all backslashes in a string with double backslashes (like addslashes() does but just for backslashes and not quotes, etc), you'll need the following:



$new = preg_replace('/\\\\/','\\\\\\\\',$old);



note the pattern uses 4 backslashes and the replacement uses 8!  the reason for 4 slashses in the pattern part has already been explained on this page, but nobody has yet mentioned the need for the same logic in the replacement part in which backslashes are also doubly parsed, once by PHP and once by the PCRE extension.  so the eight slashes break down to four slashes sent to PCRE, then two slashes put in the final output.

Nick
20-Jan-2005 03:05


Here is a more secure version of the link conversion code which hopefully make cross site scripting attacks more difficult.



<?php

function convert_links($str) {

       $replace = <<<EOPHP

'<a href="'.htmlentities('\\1').htmlentities('\\2').//remove line break

'">'.htmlentities('\\1').htmlentities('\\2').'</a>'

EOPHP;

    $str = preg_replace('#(http://)([^\s]*)#e', $replace, $str);

   return $str;

}

?>

ignacio paz posse
21-Oct-2004 01:22


I needed to treat exclusively long urls and not shorter ones for which my client prefered to have their complete addresses displayed. Here's the function I end up with:





<?php


function auto_url($txt){





  # (1) catch those with url larger than 71 characters


  $pat = '/(http|ftp)+(?:s)?:(\\/\\/)'


       .'((\\w|\\.)+)(\\/)?(\\S){71,}/i';


  $txt = preg_replace($pat, "<a href=\"\\0\" target=\"_blank\">$1$2$3/...</a>", 


$txt);





  # (2) replace the other short urls provided that they are not contained inside an html tag already.


  $pat = '/(?<!href=\")(http|ftp)+(s)?:' . 


      .'(\\/\\/)((\\w|\\.)+) (\\/)?(\\S)/i';


  $txt = preg_replace($pat,"<a href=\"$0\" target=\"_blank\">$0</a> ",


  $txt);





  return $txt;


}


?>


Note the negative look behind expression added in the second instance for exempting those that are preceded with ' href=" ' (meaning that they were already put inside appropiate html tags by the previous expression)





(get rid of the space between question mark and the last parenthesis group in both regex, I need to put it like that to be able to post this comment)

gabe at mudbuginfo dot com
18-Oct-2004 01:39


It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.

<?php



$pattern = array('/one/', '/two/');

$replace = array('uno', 'dos');

$subject = "test one, one two, one two three";



echo preg_replace($pattern, $replace, $subject, 1);

?>



If limit were applied to the whole array (which it isn't), it would return:

test uno, one two, one two three



However, in reality this will actually return:

test uno, one dos, one two three

silasjpalmer at optusnet dot com dot au
19-Mar-2004 06:00


Using preg_rep to return extracts without breaking the middle of words

(useful for search results)



<?php

$string = "Don't split words";

echo substr($string, 0, 10); // Returns "Don't spli"



$pattern = "/(^.{0,10})(\W+.*$)/"; 

$replacement = "\${1}";

echo preg_replace($pattern, $replacement, $string); // Returns "Don't"

?>

j-AT-jcornelius-DOT-com
24-Feb-2004 01:02


I noticed that a lot of talk here is about parsing URLs. Try the 

parse_url() function in PHP to make things easier.



manual/en/function.parse-url.php 



- J.

steven -a-t- acko dot net
08-Feb-2004 09:45


People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.



The example in the docs for /e suffers from this mistake in fact.



With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:



preg_replace("/(<\/?)(\w+)([^>]*>)/e",

             "'\\1'.strtoupper('\\2').'\\3'",

             $html_body);



To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string



 He said: "You're here"



It would become:



 He said: \"You\'re here\"



...and be inserted into the expression.

However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:



 print ' He said: \"You\'re here\" ';

 Output: He said: \"You're here\"



This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.



Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.



The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).



If you surrounded your backreference by single-quotes, the double-quotes are corrupt:

$text = str_replace('\"', '"', $text);



People using preg_replace with /e should at least be aware of this.



I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).

Peter
01-Nov-2003 05:00


Suppose you want to match '\n' (that's backslash-n, not newline). The pattern you want is not /\\n/ but /\\\\n/. The reason for this is that before the regex engine can interpret the \\ into \, PHP interprets it. Thus, if you write the first, the regex engine sees \n, which is reads as newline. Thus, you have to escape your backslashes twice: once for PHP, and once for the regex engine.

Travis
18-Oct-2003 03:37


I spent some time fighting with this, so hopefully this will help someone else.



Escaping a backslash (\) really involves not two, not three, but four backslashes to work properly.



So to match a single backslash, one should use:



preg_replace('/(\\\\)/', ...);



or to, say, escape single quotes not already escaped, one could write:



preg_replace("/([^\\\\])'/", "\$1\'", ...);



Anything else, such as the seemingly correct



preg_replace("/([^\\])'/", "\$1\'", ...);



gets evaluated as escaping the ] and resulting in an unterminated character class.



I'm not exactly clear on this issue of backslash proliferation, but it seems to involve the combination of PHP string processing and PCRE processing.