PHP: strip_tags

strip_tags

Sébastien
23-May-2006 08:22


hum, it seems that your function "theRealStripTags" won't have the right behavior in some cases, for example:



<?php

theRealStripTags("<!-- I want to put a <div>tag</div> -->");

theRealStripTags("<!-- Or a carrot > -->");

theRealStripTags("<![CDATA[what about this! It's to protect from HTML characters like <tag>, > and so on in XML, no?]]> -->");

?>

xyexz at yahoo dot com
09-May-2006 08:41


I have found with this function that sometimes it will only remove the first carrot from a tag and leave the rest of the tag in the string, which obviously isn't what I'm looking for.



EX: 

<?php



//Returns "tag>test/tag>"

echo strip_tags("<tag>test</tag>");



?>



I'm trying to strip_tags on a string I'm importing from xml so perhaps it has something to do with that but if you've run into this same issue I've written a function to fix it once and for all!



<?php



function theRealStripTags($string)

{

    //while there are tags left to remove

    while(strstr($string, '>'))

    {

        //find position of first carrot

        $currentBeg = strpos($string, '<');

        

        //find position of end carrot

        $currentEnd = strpos($string, '>');

        

        //find out if there is string before first carrot

        //if so save it in $tmpstring

        $tmpStringBeg = @substr($string, 0, $currentBeg);

        

        //find out if there is string after last carrot

        //if so save it in $tmpStringEnd

        $tmpStringEnd = @substr($string, $currentEnd + 1, strlen($string));

        

        //cut the tag from the string

        $string = $tmpStringBeg.$tmpStringEnd;

    }

        

    return $string;

}



//Returns "test"

echo theRealStripTags('<tag>test</tag>');



?>

soapergem at gmail dot com
28-Apr-2006 09:21


In my prior comment I made a mistake that needs correcting. Please change the forward slashes that begin and terminate my regular expression to a different character, like the at-sign (@), for instance. Here's what it should read:



$regex  = '@</?\w+((\s+\w+(\s*=\s*';

$regex .= '(?:".*?"|\'.*?\'|[^\'">\s]+))?)+';

$regex .= '\s*|\s*)/?>@i';



(There were forward-slashes embedded in the regular expression itself, so using them to begin and terminate the expression would have caused a parse error.)

JeremysFilms.com
07-Apr-2006 01:57


A simple little function for blocking tags by replacing the '<' and '>' characters with their HTML entities.  Good for simple posting systems that you don't want to have a chance of stripping non-HTML tags, or just want everything to show literally without any security issues:



<?php



function block_tags($string){

    $replaced_string = str_ireplace('<','&lt',$string);

    $replaced_string = str_ireplace('>','&gt',$replaced_string);

    return $replaced_string;

}



echo block_tags('<b>HEY</b>'); //Returns &ltb&gtHEY&lt/b&gt



?>

cesar at nixar dot org
07-Mar-2006 11:44


Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.



<?php

function strip_tags_deep($value)

{

  return is_array($value) ?

    array_map('strip_tags_deep', $value) :

    strip_tags($value);

}



// Example

$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));

$array = strip_tags_deep($array);



// Output

print_r($array);

?>

debug at jay dot net
24-Feb-2006 02:24


If you wish to steal quotes:

$quote=explode( "\n",

str_replace(array('document.writeln(\'','\')',';'),'',

strip_tags(

file_get_contents('http://www.quotationspage.com/data/1mqotd.js')

)

)

);

use $quote[2] & $quote[3]

It gives you a quote a day

balluche AROBASE free.fr
17-Feb-2006 02:16


//balluche:22/01/04:Remove even bad tags

function strip_bad_tags($html)

{

    $s = preg_replace ("@</?[^>]*>*@", "", $html);

    return $s;

}

salavert at~ akelos
13-Feb-2006 02:21


<?php

       /**

    * Works like PHP function strip_tags, but it only removes selected tags.

    * Example:

    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert

    */



    function strip_selected_tags($text, $tags = array())

    {

        $args = func_get_args();

        $text = array_shift($args);

        $tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;

        foreach ($tags as $tag){

            if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){

                $text = str_replace($found[0],$found[1],$text);

          }

        }



        return $text;

    }



?>



Hope you find it useful,



Jose Salavert

webmaster at tmproductionz dot com
01-Feb-2006 07:28


<?php



function remove_tag ( $tag , $data ) {

    

    while ( eregi ( "<" . $tag , $data ) ) {

        

        $it    = stripos ( $data , "<" . $tag   ) ;

                

        $it2   = stripos ( $data , "</" . $tag . ">" ) + strlen ( $tag ) + 3 ;

                

        $temp  = substr ( $data , 0    , $it  ) ;

    

        $temp2 = substr ( $data , $it2 , strlen ( $data ) ) ;

        

        $data = $temp . $temp2 ;

            

    }

    

    return $data ;

    

}



?>



this code will remove only and all of the specified tag from a given haystack.

lucahomer at hotmail dot com
30-Jan-2006 05:42


I think the Regular expression posted <a href=function.strip-tags.php#51383>HERE</a>  is not correct



<?php

$disalowedtags = array("font");



foreach ($_GET as $varname) 

foreach ($disalowedtags as $tag) 



----------------------------------------------------------

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) <---

----------------------------------------------------------



die("stop that");



?> 



this function also replaces  links like this :

<a href=font.php>test</a> 

because word "font" is between tags "<" ">".



I changed reg exp with this

-----------------------------------------------------

if (eregi("(<|</)".$tag."*\"?[^>]*>", $varname))

-----------------------------------------------------



bye 



Luca

Nyks
11-Oct-2005 01:39


Note for BRYN at drumdatabse dot com (manual/fr/function.strip-tags.php#52085) :



I've changed your script to support more possibilities.

- The first WHILE loop reiterates the second WHILE to strip_tags the html tags which possibly are cuted by the substr() function (and not recognized by the strip_tags() function)

- There's no more bugs with substr($textstring,0,1024) ... yes, when the WHILE loop reiterates for the second, third, fourth... time, if the length of $textstring is smaller than 1024 it returns error



<?php

function strip_tags_in_big_string($textstring){

while($textstring != strip_tags($textstring))

    {

    while (strlen($textstring) != 0)

         {

         if (strlen($textstring) > 1024) {

              $otherlen = 1024;

         } else {

              $otherlen = strlen($textstring);

         }

         $temptext = strip_tags(substr($textstring,0,$otherlen));

         $safetext .= $temptext;

         $textstring = substr_replace($textstring,'',0,$otherlen);

         }   

    $textstring = $safetext;

    }

return $textstring;

?>

info at christopher-kunz dot de
29-Aug-2005 06:34


Please note that the function supplied by daneel at neezine dot net is not a good way of avoiding XSS attacks. A string like 

<font size=">>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font> 

will be sanitized to 

<font>>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font>

which is a pretty good XSS.



If you are in need of XSS cleaning, you might want to consider the Pixel-Apes XSS cleaner: http://pixel-apes.com/safehtml

daneel at neezine dot net
22-Aug-2005 05:08


Remove attributes from a tag except the attributes specified, correction of cool routine from joris878 (who seems don't work) + example.

When PHP will going to support this natively ? 

Sorry for my english. Hope everybody understand.



--French--

Enl�ve des attributs d'une balise, sauf les attributs sp�cifi�s dans un tableau.

C'est une correction et un exemple de mise en oeuvre du code (tr�s utile) post� par joris878 qui ne semblait pas fonctionner en l'�tat.

Quand PHP supportera ceci de fa�on native ?

----------



<?

function stripeentag($msg,$tag,$attr) { 

  $lengthfirst = 0; 

  while (strstr(substr($msg,$lengthfirst),"<$tag ")!="") 

  { 

   $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag "); 

   $partafterwith = substr($msg,$imgstart); 

   $img = substr($partafterwith,0,strpos($partafterwith,">")+1); 

   $img = str_replace(" =","=",$msg); 

   $out = "<$tag";  



 for($i=0; $i <= (count($attr) - 1 );$i++) 

 { 

    $long_val = strpos($img," ",strpos($img,$attr[$i]."=")) - (strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1) ;

    $val = substr($img, strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1,$long_val);

     if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val; 

     else $attr[$i] = ""; 

     $out .= $attr[$i]; 

 } 



   $out .= ">"; 

   $partafter = substr($partafterwith,strpos($partafterwith,">")+1); 

   $msg = substr($msg,0,$imgstart).$out.$partafter; 

   $lengthfirst = $imgstart+3; 

  } 

  return $msg; 

} 



$message = "<font size=\"10\" face=\"tahoma\" color=\"#DD0000\" >salut</font>" ;



//on ne garde que la couleur

//we want only "color" attribute

$message = stripeentag($message,"font",array("color"));



echo $message ;

?>

10-Aug-2005 12:08


<?php

/**removes specifed tags from the text where each tag requires a 

     *closing tag and if the later

     *is not found then everything after will be removed

     *typical usage:

     *some html text, array('script','body','html') - all lower case*/

    public static function removeTags($text,$tags_array){

        $length = strlen($text);

        $pos =0;

        $tags_array = $array_flip($tags_array);

        while ($pos < $length && ($pos = strpos($text,'<',$pos)) !== false){

            $dlm_pos = strpos($text,' ',$pos);

            $dlm2_pos = strpos($text,'>',$pos);

            if ($dlm_pos > $dlm2_pos)$dlm_pos=$dlm2_pos;

            $which_tag = strtolower(substr($text,$pos+1,$dlm_pos-($pos+1)));

            $tag_length = strlen($srch_tag);

            if (!isset($tags_array[$which_tag])){

                //if no tag matches found

                ++$pos;

                continue;

            }

            //find the end

            $sec_tag = '</'.$which_tag.'>';

            $sec_pos = stripos($text,$sec_tag,$pos+$tag_length);

            //remove everything after if end of the tag not found

            if ($sec_pos === false) $sec_pos = $length-strlen($sec_tag);

            $rmv_length = $sec_pos-$pos+strlen($sec_tag);

            $text = substr_replace($text,'',$pos,$rmv_length);

            //update length

            $length = $length - $rmv_length;

            $pos++;

        }

        return $text;

    }

?>

erwin at spammij dot nl
08-Jul-2005 08:13


if you want to disable you can easyly replace all instances of < and > , which will make all HTML code not working.

php at scowen dot com
07-Jun-2005 12:50


I have had a similar problem to kangaroo232002 at yahoo dot co dot uk when stripping tags from html containing javascript. The javascript can obviously contain '>' and '<' as comparison operators which are seen by strip_tags() as html tags - leading to undesired results.



To christianbecke at web dot de - this can be third-party html, so although perhaps not always 'correct', that's how it is!

anonymous
27-May-2005 12:45


Someone can use attributes like CSS in the tags.

Example, you strip all tagw except <b> then a user can still do <b style="color: red; font-size: 45pt">Hello</b> which might be undesired.



Maybe BB Code would be something.

bazzy
22-Apr-2005 05:09


I think bryn and john780 are missing the point - eric at direnetworks wasn't suggesting there is an overall string limit of 1024 characters but rather that actual tags over 1024 characters long (eg, in his case it sounds like a really long encrypted <a href> tag) will fail to be stripped.



The functions to slowly pass strings through strip_tags 1024 characters at a time aren't necessary and are actually counter productive (since if a tag spans the break point, ie it is opened before the 1024 characters and closed after the 1024 characters then only the opening tag is removed which leaves a mess of text up to the closing tag).



Only mentioning this as I spent ages working out a better way to deal with this character spanning before I actually went back and read eric's post and realised the subsequent posts were misleading - hopefully it'll save others the same headaches :)

bryn -at- drumdatabase dot net
20-Apr-2005 02:38


Further to john780's idea for a solution to the 1024 character limit of strip_tags - it's a good one, but I think the ltrim function isn't the one for the job? I wrote this simple function to get around the limit (I'm a newbie, so there may be some problem / better way of doing it!):



<?

function strip_tags_in_big_string($textstring){

    while (strlen($textstring) != 0)

        {

        $temptext = strip_tags(substr($textstring,0,1024));

        $safetext .= $temptext;

        $textstring = substr_replace($textstring,'',0,1024);

        }    

    return $safetext;

}

?>



Hope someone finds it useful.

cz188658 at tiscali dot cz
07-Apr-2005 01:21


If you want to remove XHTML tags like <br /> (single pair tags), as an allowable_tags parametr you must include tag <br>

Jiri

php at arzynik dot com
29-Mar-2005 04:04


instead of removing tags that you dont want, sometimes you might want to just stop them from doing anything.



<?php

$disalowedtags = array("script",

                        "object",

                        "iframe",

                        "image",

                        "applet",

                        "meta",

                        "form",

                        "onmouseover",

                        "onmouseout");



foreach ($_GET as $varname) 

foreach ($disalowedtags as $tag) 

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) 

die("stop that");



foreach ($_POST as $varname) 

foreach ($disalowedtags as $tag) 

if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) 

die("stop that");



?>

christianbecke at web dot de
16-Feb-2005 06:34


to kangaroo232002 at yahoo dot co dot uk:



As far as I understand, what you report is not a bug in strip_tags(), but a bug in your HTML.

You should use alt='Go &gt;' instead of alt='Go >'.



I suppose your HTML diplays allright in browsers, but that does not mean it's correct. It just shows that browsers are more graceful concerning characters not properly escaped as entities than strip_tags() is.

kangaroo232002 at yahoo dot co dot uk
03-Feb-2005 05:23


After wondering why the following was indexed in my trawler despite stripping all text in tags (and punctuation) "� valign left align middle border 0 src go gif name search1 onclick search", please take a quick look at what produced it: <DIV style="position: absolute; TOP:22%; LEFT:68%;"><input type="image" alt="Go >" valign="left" align="middle" border=0 src="go.gif" name="search1" onClick="search()"></div>...



looking at this closely, it is possible to see that despite the 'Go >' statement being enclosed in speech marks (with the right facing chevron), strip_tags() still assumes that it is the end of the input statement, and treats everything after as text. Not sure if this has been fixed in later versions; im using v4.3.3...



good hunting.

jon780 -at- gmail.com
02-Feb-2005 09:18


To eric at direnetworks dot com regarding the 1024 character limit:



You could simply ltrim() the first 1024 characters, run them through strip_tags(), add them to a new string, and remove them from the first.



Perform this in a loop which continued until the original string was of 0 length.

dumb at coder dot com
17-Jan-2005 04:22


/*

15Jan05



Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters: 

&lt; shows as <    --    &amp; shows as &    --    etc.



Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING.  There's no way to change this, making it difficult to edit html in a <textarea>



"HTML Entity Codes" (ie, use of &#60 to represent "<", &#38 to represent "&" &#160 to represent "&nbsp;") can be used instead.  Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>.



how would I get a textarea to contain "&lt;" as a literal string of characters and not have it display a "<"

&amp;lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;amp;lt;'. That's just how HTML entities work.



htmlspecialchars() is a subset of htmlentities()

the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode()



google on ns_quotehtml and see http://aolserver.com/docs/tcl/ns_quotehtml.html

see also http://www.htmlhelp.com/reference/html40/entities/

*/

eric at direnetworks dot com
20-Dec-2004 06:36


the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024.  If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag).   I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long.  I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.

ashley at norris dot org dot au
31-Oct-2004 07:11


leathargy at hotmail dot com wrote:



"it seems we're all overlooking a few things:

1) if we replace "</ta</tableble>" by removing </table, we're not better off..."



I beat this by using ($input contains the data):



<?php

while($input != strip_tags($input)) {

            $input = strip_tags($input);

        }

?>



This iteratively strips tags until all tags have gone :)

@dada
29-Sep-2004 05:41


if you  only want to have the text within the tags, you can use this function:



function showtextintags($text)



{



$text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text");

$text = strip_tags($text);

$text = str_replace("<!--", "&lt;!--", $text);

$text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text);



return $text;



}



it will show all the text without tags and (!!!) without javascripts

Anonymous User
22-Aug-2004 09:24


Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.



For example, 



"<strong>This is a bit of text</strong><p />Followed by this bit"



are seperable paragraphs on a visual plane, but if simply stripped of tags will result in



"This is a bit of textFollowed by this bit"



which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.



The workaround is to force whitespace prior to stripping, using something like this:



      $text = getTheText();

      $text = preg_replace('/</',' <',$text);

      $text = preg_replace('/>/','> ',$text);

      $desc = html_entity_decode(strip_tags($text));

      $desc = preg_replace('/[\n\r\t]/',' ',$desc);

      $desc = preg_replace('/  /',' ',$desc);

Isaac Schlueter php at isaacschlueter dot com
16-Aug-2004 07:32


steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments.  With this function, you can.  Just pass <!--> as one of the allowed tags.  Easy as pie: just pull them out, strip, and then put them back.



<?php

function strip_tags_c($string, $allowed_tags = '')

{    

    $allow_comments = ( strpos($allowed_tags, '<!-->') !== false );

    if( $allow_comments ) 

    {

        $string = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $string);

        $allowed_tags = str_replace('<!-->', '', $allowed_tags);

    }

    $string = strip_tags( $string, $allowed_tags );

    if( $allow_comments ) $string = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $string);

    return $string;

}

?>

Isaac Schlueter php at isaacschlueter dot com
15-Aug-2004 11:16


I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags.  The problem is that if you have something like this:

<a href="http://www.php.net" title="PHP is cool!">PHP</a>



then the plugin will mangle it into:



<a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a>



This function will strip out tags that occur within other tags.  Not super-useful in tons of situations, but it was an interesting puzzle.  I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag.



The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags.



<?php

if( !function_exists( 'antiTagInTag' ) )

{

    // $content is the string to be anti-tagintagged, and $format sets the format of the internals.

    function antiTagInTag( $content = '', $format = 'htmlhead' )

    {

        if( !function_exists( 'format_to_output' ) ) 

        {    // Use the external function if it exists, or fall back on just strip_tags.

            function format_to_output($content, $format)

            {

                return strip_tags($content);

            }

        }

        $contentwalker = 0;

        $length = strlen( $content );

        $tagend = -1;

        for( $tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) )

        {

            // got the start of a tag.  Now find the proper end!

            $walker = $tagstart + 1;

            $open = 1;

            while( $open != 0 && $walker < strlen( $content ) )

            {

                $nextopen = strpos( $content, '<', $walker );

                $nextclose = strpos( $content, '>', $walker );

                if( $nextclose === false )

                {    // ERROR! Open waka without close waka!

                    // echo '<code>Error in antiTagInTag - malformed tag!</code> ';

                    return $content;

                }

                if( $nextopen === false || $nextopen > $nextclose )

                { // No more opens, but there was a close; or, a close happens before the next open.

                    // walker goes to the close+1, and open decrements

                    $open --;

                    $walker = $nextclose + 1;

                }

                elseif( $nextopen < $nextclose )

                { // an open before the next close

                    $open ++;

                    $walker = $nextopen + 1;

                }

            }

            $tagend = $walker;

            if( $tagend > strlen( $content ) ) 

                $tagend = strlen( $content );

            else

            {

                $tagend --;

                $tagstart ++;

            }

            $tag = substr( $content, $tagstart, $tagend - $tagstart );

            $tags[] = '<' . $tag . '>';

            $newtag = format_to_output( $tag, $format );

            $newtags[] = '<' . $newtag . '>';

            $newtag = format_to_output( $tag, $format );

        }

        

        $content = str_replace($tags, $newtags, $content);

        return $content;

    }

}

Tony Freeman
19-Nov-2003 02:45


This is a slightly altered version of tREXX's code.  The difference is that this one simply removes the unwanted attributes (rather than flagging them as forbidden).



function removeEvilAttributes($tagSource)

{

        $stripAttrib = "' (style|class)=\"(.*?)\"'i";

        $tagSource = stripslashes($tagSource);

        $tagSource = preg_replace($stripAttrib, '', $tagSource);

        return $tagSource;

}



function removeEvilTags($source)

{

    $allowedTags='<a><br><b><h1><h2><h3><h4><i>' .

             '<img><li><ol><p><strong><table>' .

             '<tr><td><th><u><ul>';

    $source = strip_tags($source, $allowedTags);

    return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);

}



$text = '<p style="Normal">Saluton el <a href="#?"

 class="xsarial">Esperanto-lando</a><img src="my.jpg"

 alt="Saluton" width=100 height=100></p>';



$text = removeEvilTags($text);



var_dump($text);

leathargy at hotmail dot com
26-Oct-2003 10:15


it seems we're all overlooking a few things:

1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work...

2) no browser treats { as <.[as far as i know]

3) because of statement 2, we can do:

$remove=array("<?","<","?>",">");

$change=array("{[pre]}","{[","{/pre}","]}");

$repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}");

// and so forth...



$repairChange("<pre>","</pre>","<b>","<b>","<br>");

// and so forth...



$maltags=array("{[","]}");

$nontags=array("{","}");

$unclean=...;//get variable from somewhere...

$unclean=str_replace($remove,$change,$unclean);

$unclean=str_replace($repairSeek, $repairChange, $unclean);

$clean=str_replace($maltags, $nontags, $unclean);



////end example....

4) we can further improve the above by using explode(for our ease):

function purifyText($unclean, $fixme)

{

$remove=array();

$remove=explode("\n",$fixit['remove']);

//... and so forth for each of the above arrays...

// or you could just pass the arrays..., or a giant string

//put above here...

return $clean

}//done

tREXX [www.trexx.ch]
15-Oct-2003 06:15


Here's a quite fast solution to remove unwanted tags AND also unwanted attributes within the allowed tags:



<?php

/**

 * Allow these tags

 */

$allowedTags = '<h1><b><i><a><ul><li><pre><hr><blockquote><img>';



/**

 * Disallow these attributes/prefix within a tag

 */

$stripAttrib = 'javascript:|onclick|ondblclick|onmousedown|onmouseup|onmouseover|'.

               'onmousemove|onmouseout|onkeypress|onkeydown|onkeyup';



/**

 * @return string

 * @param string

 * @desc Strip forbidden tags and delegate tag-source check to removeEvilAttributes()

 */

function removeEvilTags($source)

{

    global $allowedTags;

    $source = strip_tags($source, $allowedTags);

    return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source);

}



/**

 * @return string

 * @param string

 * @desc Strip forbidden attributes from a tag

 */

function removeEvilAttributes($tagSource)

{

    global $stripAttrib;

    return stripslashes(preg_replace("/$stripAttrib/i", 'forbidden', $tagSource));

}



// Will output: <a href="forbiddenalert(1);" target="_blank" forbidden =" alert(1)">test</a>

echo removeEvilTags('<a href="javascript:alert(1);" target="_blank" onMouseOver = "alert(1)">test</a>');

?>

dougal at gunters dot org
10-Sep-2003 01:03


strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like:



$html = str_replace('<!DOCTYPE','<DOCTYPE',$html);



before processing with strip_tags().

joris878 at hotmail dot com
04-Jun-2003 05:58


[   Editor's Note: This functionality will be natively supported in a future release of PHP.  Most likely 5.0   ]





This routine removes all attributes from a given tag except


the attributes specified in the array $attr.





function stripeentag($msg,$tag,$attr) {


  $lengthfirst = 0;


  while (strstr(substr($msg,$lengthfirst),"<$tag ")!="")


  {


    $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag ");


    $partafterwith = substr($msg,$imgstart);


    $img = substr($partafterwith,0,strpos($partafterwith,">")+1);


    $img = str_replace(" =","=",$msg);


    $out = "<$tag";  


    for($i=1;$i<=count($atr);$i++)


    {


      $val = filter($img,$attr[$i]."="," ");


      if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val;


      else $attr[$i] = "";


      $out .= $attr[$i];


    }


    $out .= ">";


    $partafter = substr($partafterwith,strpos($partafterwith,">")+1);


    $msg = substr($msg,0,$imgstart).$out.$partafter;


    $lengthfirst = $imgstart+3;


  }


  return $msg;


}

Chuck
20-Mar-2003 04:01


Caution, HTML created by Word may contain the sequence 

'<?xml...' 



Apparently strip_slashes treats this like <?php and removes the remainder of the input string. Not the just the XML tag but all input that follows.

dontknowwhat at thehellIamdoing dot com
19-Nov-2002 06:23


Here's a quickie that will strip out only specific tags. I'm using it to clean up Frontpage and WORD code from included third-party code (which shouldn't have the all the extra header information in it).



$contents = "Your HTML string";



// Part 1

// This array is for single tags and their closing counterparts



$tags_to_strip = Array("html","body","meta","link","head");



foreach ($tags_to_strip as $tag) {

       $contents = preg_replace("/<\/?" . $tag . "(.|\s)*?>/","",$contents);

}



// Part 2

// This array is for stripping opening and closing tags AND what's in between



$tags_and_content_to_strip = Array("title");



foreach ($tags_and_content_to_strip as $tag) {

       $contents = preg_replace("/<" . $tag . ">(.|\s)*?<\/" . $tag . ">/","",$contents);

}

mrmaxxx333 at triad dot rr dot com
07-May-2002 11:29


to rid everything in between script tags, including the script tags, i use this.





<?php


$description = ereg_replace("~<script[^>]*>.+</script[^>]*>~isU", "", $description);


?>





it hasn't been extensively tested, but it works.





also, i ran into trouble with a href tags. i wanted to strip out the url in them. i did this to turn an <a href="blah.com">welcome to blah</a> into welcome to blah (blah.com)





<?php


$string = preg_replace('/<a\s+.*?href="([^"]+)"[^>]*>([^<]+)<\/a>/is', '\2 (\1)', $string);


?>

guy at datalink dot SPAMMENOT dot net dot au
14-Mar-2002 10:19


Strip tags will NOT remove HTML entities such as &nbsp;

chrisj at thecyberpunk dot com
18-Dec-2001 12:57


strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:





$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);

strip_tags

Описание