|
 |
preg_match (PHP 3 >= 3.0.9, PHP 4, PHP 5) preg_match -- Выполняет проверку на соответствие регулярному выражению Описаниеmixed preg_match ( string pattern, string subject [, array &matches [, int flags [, int offset]]] )
Ищет в заданном тексте subject совпадения
с шаблоном pattern
В случае, если дополнительный параметр matches указан,
он будет заполнен результатами поиска. Элемент $matches[0] будет содержать
часть строки, соответствующую вхождению всего шаблона, $matches[1] - часть строки,
соответствующую первой подмаске, и так далее.
flags может принимать следующие значения:
- PREG_OFFSET_CAPTURE
В случае, если этот флаг указан, для каждой найденной подстроки будет указана
ее позиция в исходной строке. Необходимо помнить, что этот флаг меняет
формат возвращаемых данных: каждое вхождение возвращается в виде массива,
в нулевом элементе которого содержится найденная подстрока, а в первом - смещение.
Данный флаг доступен в PHP 4.3.0 и выше.
Дополнительный параметр flags доступен начиная с
PHP 4.3.0.
Поиск осуществляется слева направо, с начала строки. Дополнительный параметр
offset может быть использован для указания альтернативной
начальной позиции для поиска. Дополнительный параметр
offset доступен начиная с PHP 4.3.3.
Замечание:
Использование параметра offset не эквивалентно
замене сопоставляемой строки выражением substr($subject, $offset)
при вызове функции preg_match_all(), поскольку
шаблон pattern может содержать такие условия как
^, $ или (?<=x).
Сравните:
Функция preg_match() возвращает количество найденных соответствий.
Это может быть 0 (совпадения не найдены) и 1, поскольку preg_match() прекращает
свою работу после первого найденного совпадения. Если необходимо найти либо сосчитать все совпадения,
следует воспользоваться функцией preg_match_all().
Функция preg_match() возвращает FALSE в случае, если во время выполнения возникли какие-либо ошибки.
Подсказка:
Не используйте функцию preg_match(), если необходимо проверить наличие подстроки в заданной строке.
Используйте для этого strpos() либо strstr(), поскольку они
выполнят эту задачу гораздо быстрее.
Пример 1. Поиск подстроки "php" в тексте
<?php
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
echo "Вхождение найдено.";
} else {
echo "Вхождение не найдено.";
}
?>
|
|
Пример 2. Поиск слова "web" в тексте
<?php
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "Вхождение найдено.";
} else {
echo "Вхождение не найдено.";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "Вхождение найдено.";
} else {
echo "Вхождение не найдено.";
}
?>
|
|
Пример 3. Извлечение доменного имени из URL
<?php
preg_match("/^(http:\/\/)?([^\/]+)/i",
"index.html", $matches);
$host = $matches[2];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
|
Результат работы примера:
|
Смотрите также preg_match_all(),
preg_replace(), и
preg_split().
preg_match
pnomolos --- gmail --- com
13-Jul-2006 01:36
A note to lcvalentine and najeli ... to make things much easier, you can include 'x' in the regular expression modifiers which makes whitespace outside of character classes mean nothing (meaning you don't have to remove breaks), as well as allowing you to comment the regular expression... like so!
preg_match( "/^
[\d\w\/+!=#|$?%{^&}*`'~-] # Wow that's ugly looking
[\d\w\/\.+!=#|$?%{^&}*`'~-]*@ # So's that one
[A-Z0-9]
[A-Z0-9.-]{0,61}
[A-Z0-9]\. # Letters or numbers, then a dot
[A-Z]{2,6}$/ix", 'user@subdom.dom.tld'
);
najeli at gmail dot com
09-Jul-2006 11:03
A little comment to lcvalentine mail validation expression - it recognizes emails like "user@fo.com" as non valid, but there are some valid ones (at least in Poland, ex. wp.pl, o2.pl etc.).
After changing {1,61} to {0,61} everything works fine, I hope.
<?php
preg_match( "/^
[\d\w\/+!=#|$?%{^&}*`'~-]
[\d\w\/\.+!=#|$?%{^&}*`'~-]*@
[A-Z0-9]
[A-Z0-9.-]{0,61}
[A-Z0-9]\.
[A-Z]{2,6}$/i", 'user@subdom.dom.tld'
);
?>
(remove breaks)
volkank at developera dot com
07-Jul-2006 07:01
I will add some note about my last post.
Leading zeros in IP addresses can cause problems on both Windows and Linux, because one can be confused if it is decimal or octal (if octal not written properly)
"66.163.161.117" is in a decimal syntax but in "066.163.161.117" the first octet 066 is in octal syntax.
So "066.163.161.117" is recognized as decimal "54.163.161.117" by the operating system.
BTW octal is alittle rare syntax so you may not want or need to match it.
***
Unless you specially want to match IP addresses including both decimal and octal syntax; you can use Chortos-2's pattern which is suitable for most conditions.
<?php
$num='(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])';
if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) ...
preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match); ...
?>
***
Also my previous pattern still have bug and needs some changes to correctly match both decimal and octal syntax.
steve at webcommons dot biz
05-Jul-2006 11:48
This function (for PHP 4.3.0+) uses preg_match to return the regex position (like strpos, but using a regex pattern instead):
function preg_pos($sPattern, $sSubject, &$FoundString, $iOffset = 0) {
$FoundString = NULL;
if (preg_match($sPattern, $sSubject, $aMatches, PREG_OFFSET_CAPTURE, $iOffset) > 0) {
$FoundString = $aMatches[0][0];
return $aMatches[0][1];
}
else {
return FALSE;
}
}
It also returns the actual string found using the pattern, via $FoundString.
lcvalentine at gmail dot com
23-May-2006 02:53
After doing some testing for my company and reading the RFCs mentioned on wikipedia, I have found that the following RegEx appears to match any standards-based e-mail address.
Please give test it in your own configs and respond if it works well so other users don't waste too much time looking:
(remove breaks)
<?php
preg_match( "/^
[\d\w\/+!=#|$?%{^&}*`'~-]
[\d\w\/\.+!=#|$?%{^&}*`'~-]*@
[A-Z0-9]
[A-Z0-9.-]{1,61}
[A-Z0-9]\.
[A-Z]{2,6}$/i", 'user@subdom.dom.tld'
);
?>
agilo3 at gmail dot com
21-May-2006 02:54
I seem to have made a few critical mistakes in my previous entry, to correct the problem I'm re-pasting my entry (I hope an admin can delete the other entry?):
I want to make an important notice to everyone using preg_match to validate a file used for inclusion.
If you use preg_match like so (like I have in the past):
<?php
if (preg_match("/(.*)\.txt$/", $_GET['file'])) {
include($_GET['file']);
}
?>
Be sure to know that you can get around that security by using a null string terminator, like so:
page.php?file=/etc/passwd%00.txt
Quick explanation: strings end in a null string terminator which is what seperates strings (%00 is hex for the null string terminator character).
What this does is effectively rule out everything after %00 and validate this string (if I understand correctly by the way preg_match handles this) leading in the inclusion of the servers' /etc/passwd file.
One way to go around it is by doing something like this:
<?php
if (preg_match("/^[a-zA-Z0-9\/\-\_]+\.txt$/", $_GET['file'])) {
include($_GET['file']);
}
?>
Which will check if (from the start) it consists of alphanumberic characters and can possibly contain a slash (subdirectories), underscore and a dash (used by some in filenames) and ends at the end of the string in ".txt".
azuretek at gmail dot com
18-May-2006 11:34
If you want to use a prefix for a folder in your project you can do as follows. In my case I wanted to make our dev and prod environments include based on whichever folder we were using as the document root. Though it would be also useful if you want a system to act differently based on which folder it resides in, eg. different results for email, security, and urls. This will return the proper info no matter where the file is as long as it's contained within the document root.
You can do it like this:
$envPrefix = $_ENV['PWD'];
preg_match('/\/.*\/(.*)_project/', $envPrefix, $matches);
$envPrefix = $matches[1];
will return:
array(2) {
[0]=>
string(25) "/home/dev_project"
[1]=>
string(3) "dev"
}
You can then use that prefix to include the proper files. This method is useful for developers with seperate copies of their project, live and dev. It helps with merging updates, less or nothing to change in between each copy.
ickata at ickata dot net
16-May-2006 12:23
If you want to perform a case-insensitive match with
cyrillic characters, there is a problem if your server is
Linux-based. So here is a useful function to perform
a case-insensitive match:
<?php
function cstrtolower($str) {
return strtr($str,
"АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ",
"абвгдежзийклмнопрстуфхцчшщъьюя");
}
function cstrtoupper($str) {
return strtr($str,
"абвгдежзийклмнопрстуфхцчшщъьюя",
"АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЬЮЯ");
}
function createFilter ($string) {
$string = cstrtolower($string);
for ($i=0;$i<strlen($string);$i++) {
$letter_small = substr($string,$i,1);
$letter_big = cstrtoupper($letter_small);
$newstr .= ‘[’.$letter_small.$letter_big.‘]’;
}
return $newstr;
}
$string = "Това е ТесТ - проверка";
$keyword = "тест";
if (preg_match ("/".createFilter($word)."/", $text))
echo "A match was found.";
else echo "A match was NOT found!";
?>
And one more thing - if you want to perform a match
with a whole word only, do not use "/\bkeyword\b/",
use this:
<?php
preg_match ("/[А-Яа-яA-Za-z]keyword[А-Яа-яA-Za-z]/");
?>
axima at prameasolutions dot com
11-May-2006 04:34
function to validate and extract date parts from the string in following formats:
dd-mm-yyyy or dd/mm/yyyy or dd mm yyyy
d-m-yyyy and so on
actually delimiter is what ever exept a number
funtion get_correct_date($input = ''){
//clean up the input
$input = trim($input);
//matching pattern
$pattern = '/^([1-9]|0[1-9]|[12][1-9]|3[01])\D([1-9]|0[1-9]|1[012])\D
(19[0-9][0-9]|20[0-9][0-9])$/';
//check the input
preg_match(input ,$pattern, $parts);
return $parts;
}
function will return empty array on failure or
parts of the date as array(dd,mm,yyyy);
note: remove empty space from pattern if you copy it from here
martin at kouba dot at
04-Apr-2006 04:02
in reply to "brodseba at brodseba dot com"
hmm, wouldn't it be much easier to do it this way?
function preg_match_between($a_sStart, $a_sEnd, $a_sSubject)
{
$pattern = '/'. $a_sStart .'(.*?)'. $a_sEnd .'/';
preg_match($pattern, $a_sSubject, $result);
return $result[1];
}
brodseba at brodseba dot com
20-Mar-2006 10:56
This little function is self-explaining.
function preg_match_between($a_sStart, $a_sEnd, $a_sSubject)
{
$pattern = '/'. $a_sStart .'(.*?)'. $a_sEnd .'/';
preg_match($pattern, $a_sSubject, $result);
$pattern = '/'. $a_sStart .'/';
$result = preg_replace($pattern, '', $result[0]);
$pattern = '/'. $a_sEnd .'/';
$result = preg_replace($pattern, '', $result);
return $result;
}
Chris Shucksmith <chris at shucksmith dot com>
04-Mar-2006 10:05
I created the following snippit to parse fixed width tabular data to an array of arrays. I use this to create a HTML table showing the output from a linux shell command. It requires an array of column widths used to build a regular expression. This is passed to preg_match_all in multiline mode over the entire command line output. $matches is examined and the table built.
This example formats a table of SIP Peers connected to an Asterisk VOIP Server. The command output looks like:
| # asterisk -rqx 'sip show peers'
| Name/username Host Dyn Nat ACL Port Status
| 308/308 (Unspecified) D 0 UNKNOWN
| 303/303 45.230.86.123 D N 5060 OK (84 ms)
| 302/302 192.168.14.71 D 5060 OK (80 ms)
| 6 sip peers [3 online , 3 offline]
Code:
<table>
<tr><th></th> <th>Extension</th> <th>Host</th> <th>Dynamic</th> <th>NAT</th><th>ACL</th> <th>Port</th> <th>Status</th> </tr>
<?php
$dfout = exec("sudo asterisk -rqx 'sip show peers'", $pss);
$psline = implode("\\n", $pss); $table = array(27,16,4,4,4,9,-1); unset($pattern);
foreach($table as $t) {
$pattern = ($t == -1) ? $pattern.'(.{0,})' : $pattern.'(.{'.$t.'})';
}
$pattern = '/^'.$pattern.'$/m'; if (preg_match_all($pattern, $psline, $matches,PREG_SET_ORDER)) {
unset($matches[0]); foreach ($matches as $m) {
echo '<tr><td>';
if (strpos($m[7],'OK') !== false) echo '<img src="/img/dg.png">';
if (strpos($m[7],'LAGGED') !== false) echo '<img src="/img/dy.png">';
if (strpos($m[7],'UNKNOWN') !== false) echo '<img src="/img/dr.png">';
echo '</td><td>'.$m[1].'</td><td>'.$m[2].'</td><td>'.$m[3].'</td><td>';
echo $m[4].'</td><td>'.$m[5].'</td><td>'.$m[6].'</td><td>'.$m[7];
echo '</td><tr>';
}
} else {
echo '<img src="/img/dr.png"> Connection to server returned no data.';
}
?>
</table>
jpittman2 at gmail dot com
22-Feb-2006 09:08
Here's an extremely complicated regular expression to match the various parts of an Oracle 8i SELECT statement. Obviously the SELECT statment is contrived, but hopefully this RegExp will work on just about any valid SELECT statement. If there are any problems, feel free to comment.
<?
$sql = "SELECT /*+ ORDERED */ UNIQUE foo, bar, baz FROM fly, flam"
. " WHERE bee=1 AND fly=3"
. " START WITH foo=1 CONNECT BY some_condition"
. " CONNECT BY some_other_condition"
. " GROUP BY a_bunch_of_fields HAVING having_clause"
. " CONNECT BY some_other_connect_by_condition"
. " INTERSECT (SELECT * FROM friday WHERE beep=1)"
. " ORDER BY one, two, three"
. " FOR UPDATE bee bop boo";
$re = "/^ # Match beginning of string
SELECT\\s+ # SELECT
(?:(?P<hints>\\/\\*\\+\\s+.+\\s+\\*\\/|--\\+\\s+.+\\s+--)\\s+)? # Hints
(?:(?P<DUA>DISTINCT|UNIQUE|ALL)\\s+)? # (self-explanatory)
(?P<fields>.+?) # fields
\\s+FROM\\s+ # FROM
(?:(?P<tables>.+?) # tables
(?:\\s+WHERE\\s+(?P<where>.+?))? # WHERE Clauses
(?:\\s+
(?:(?:START\\s+WITH\\s(?P<startWith>.+?)\\s+)? # START WITH
CONNECT\\s+BY\\s+(?P<connectBy>.+?) # CONNECT BY
) # Hierarchical Query
|
(?:GROUP\\s+BY\\s+(?P<groupBy>.+?) # Group By
(?:\\s+HAVING\\s+(?P<having>.+?))?) # Having
)* # Hier,Group
(?:\\s+(?P<UIM>UNION(?:\\s+ALL)?|INTERSECT|MINUS) # UNION,INTSECT,MINUS
\\s+\\((?P<subquery>.+)\\))? # UIM subquery
(?:\\s+ORDER\\s+BY\\s+(?P<orderBy>.+?))? # Order by
(?:\\s+FOR\\s+UPDATE\\s*(?P<forUpdate>.+?)?)? # For Update
) # tables
$ # Match end of string
/xi";
$matches = array();
preg_match($re, $sql, $matches);
var_dump($matches);
?>
volkank at developera dot com
16-Feb-2006 11:12
Correct IP matching Pattern:
Max's IP match pattern fail on this IP '009.111.111.1',
Chortos-2's pattern fail on both '09.111.111.1' and '009.111.111.1'
Most of other patterns written also fail if you use them in preg_match_all, they return incorrect IP
ie.
$num="([0-9]|[0-9]{2}|1\d\d|2[0-4]\d|25[0-5])";
$test="127.0.0.112 10.0.0.2";
preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);
print_r($match);
will print "127.0.0.1" not "127.0.0.112"; so its wrong.
To make my pattern compatible with preg_match_all IP matching (parsing multiple IPs)
I write my pattern reverse order also.
This is my new IP octet pattern probably perfect:)
$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";
/*
25[0-5] => 250-255
2[0-4]\d => 200-249
[01]?\d\d => 00-99,000-199
\d => 0-9
*/
<?
$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";
$ip_addr='009.111.111.100';
if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $ip_addr,$match)) echo "Wrong IP Address\\n";
echo $match[0];
?>
<?
$num="(25[0-5]|2[0-4]\d|[01]?\d\d|\d)";
$test="127.0.0.112 10.0.0.2";
preg_match_all("/$num\\.$num\\.$num\\.$num/",$test,$match);
print_r($match);
?>
roberta at lexi dot net
13-Feb-2006 10:25
How to verify a Canadian postal code!
if (!preg_match("/^[a-z]\d[a-z] ?\d[a-z]\d$/i" , $postalcode))
{
echo "Your postal code has an incorrect format."
}
brunosermeus at gmail dot com
08-Feb-2006 12:21
I've created this function to let you see the ease of using regular expressions instead of using some class that are available online, and that are verry slow in proceeding.
This function is an RSS-reader that only need the URL as parameter.
<?php
function RSSreader($url)
{
$rssstring = file_get_contents($url);
preg_match_all("#<title>(.*?)</title>#s",$rssstring,$titel);
preg_match_all("#<item>(.*?)</item>#s",$rssstring,$items);
$n=count($items[0]);
for($i=0;$i<$n;$i++)
{
$rsstemp= $items[0][$i];
preg_match_all("#<title>(.*?)</title>#s",$rsstemp,$titles);
$title[$i]= $titles[1][0];
preg_match_all("#<pubDate>(.*?)</pubDate>#s",$rsstemp,$dates);
$date[$i]= $dates[1][0];
preg_match_all("#<link>(.*?)</link>#s",$rsstemp,$links);
$link[$i]= $links[1][0];
}
echo "<h2>".$titel[1][0]."</h2>";
for($i=0;$i<$n;$i++)
{
$timestamp=strtotime($date[$i]);
$datum=date('d-m-Y H\hi', $timestamp);
if(!empty($title[$i])) echo $datum."\t\t\t <a href=".$link[$i]." target=\"_blank\">".$title[$i]."</a><br>";
}
}
?>
patrick at procurios dot nl
29-Jan-2006 10:17
This is the only function in which the assertion \\G can be used in a regular expression. \\G matches only if the current position in 'subject' is the same as specified by the index 'offset'. It is comparable to the ^ assertion, but whereas ^ matches at position 0, \\G matches at position 'offset'.
Zientar
02-Jan-2006 07:54
With this function you can check your date and time in this format: "YYYY-MM-DD HH:MM:SS"
<?php
function Check_Date_Time($date_time)
{
if (preg_match("/^([123456789][[:digit:]]{3})-
(0[1-9]|1[012])-(0[1-9]|[12][[:digit:]]|3[01])
(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])$/",
$date_time, $part) && checkdate($part[2], $part[3], $part[1]))
{
return true;
}
else
{
return false;
}
}
$my_date_time = "2006-01-02 16:50:15";
if (Check_Date_Time($my_date_time))
{
echo "My date '".$my_date_time."' is correct";
}
else
{
echo "My date '".$my_date_time."' is incorrect";
}
?>
john at recaffeinated d0t c0m
27-Dec-2005 08:27
Here's a format for matching US phone numbers in the following formats:
###-###-####
(###) ###-####
##########
It restricts the area codes to >= 200 and exchanges to >= 100, since values below these are invalid.
<?php
$pattern = "/(\([2-9]\d{2}\)\s?|[2-9]\d{2}-|[2-9]\d{2})"
. "[1-9]\d{2}"
. "-?\d{4}/";
?>
max99x [at] gmail [dot] com
06-Nov-2005 09:11
Here's an improvement on the URL detecting function written by [rickyale at ig dot com dot br]. It detects SRC, HREF and URL links, in addition to URLs in CSS code, and Javascript imports. It also understands html entities(such as &) inside URLs.
<?php
function get_links($url) {
if( !($body = @file_get_contents($url)) ) return FALSE;
$pattern = "/((@import\s+[\"'`]([\w:?=@&\/#._;-]+)[\"'`];)|";
$pattern .= "(:\s*url\s*\([\s\"'`]*([\w:?=@&\/#._;-]+)";
$pattern .= "([\s\"'`]*\))|<[^>]*\s+(src|href|url)\=[\s\"'`]*";
$pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>))/i";
preg_match_all ($pattern, $body, $matches);
return (is_array($matches)) ? $matches:FALSE;
}
?>
$matches[3] will contain Javascript import links, $matches[5] will contain the CSS links, and $matches[8] will contain the regular URL/SRC/HREF HTML links. To get them all in one neat array, you might use something like this:
<?php
function x_array_merge($arr1,$arr2) {
for($i=0;$i<count($arr1);$i++) {
$arr[$i]=($arr1[$i] == '')?$arr2[$i]:$arr1[$i];
}
return $arr;
}
$url = 'http://www.google.com';
$m = get_links($url);
$links = x_array_merge($m[3],x_array_merge($m[5],$m[8]));
?>
rebootconcepts.com
05-Nov-2005 07:33
Guarantee (one) trailing slash in $dir:
<?php
$dir = preg_match( '|(.*)/*$|U', $dir, $matches );
$dir = $matches[1] . '/';
?>
For whatever reason,
<?php $dir = preg_replace( '|^([^/]*)/*$|', '$1/', $dir ); ?>
and
<?php $dir = preg_replace( '|/*$|U', '/', $dir ); ?>
don't work (perfectly). The match, concat combo is the only thing I could get to work if there was a '/' within $dir (like $dir = "foo/bar";
phpnet_spam at erif dot org
26-Oct-2005 10:37
Test for valid US phone number, and get it back formatted at the same time:
function getUSPhone($var) {
$US_PHONE_PREG ="/^(?:\+?1[\-\s]?)?(\(\d{3}\)|\d{3})[\-\s\.]?"; //area code
$US_PHONE_PREG.="(\d{3})[\-\.]?(\d{4})"; // seven digits
$US_PHONE_PREG.="(?:\s?x|\s|\s?ext(?:\.|\s)?)?(\d*)?$/"; // any extension
if (!preg_match($US_PHONE_PREG,$var,$match)) {
return false;
} else {
$tmp = "+1 ";
if (substr($match[1],0,1) == "(") {
$tmp.=$match[1];
} else {
$tmp.="(".$match[1].")";
}
$tmp.=" ".$match[2]."-".$match[3];
if ($match[4] <> '') $tmp.=" x".$match[4];
return $tmp;
}
}
usage:
$phone = $_REQUEST["phone"];
if (!($phone = getUSPhone($phone))) {
//error gracefully :)
}
1413 at blargh dot com
06-Oct-2005 01:41
For a system I'm writing, I get MAC addresses in a huge number of formats. I needed something to handle all of the following:
0-1-2-3-4-5
00:a0:e0:15:55:2f
89 78 77 87 88 9a
0098:8832:aa33
bc de f3-00 e0 90
00e090-ee33cc
::5c:12::3c
0123456789ab
and more. The function I came up with is:
<?php
function ValidateMAC($str)
{
preg_match("/^([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})[-
: ]?([0-9a-fA-F]{0,2})[-: ]?([0-9a-fA-F]{0,2})$/", $str, $arr);
if(strlen($arr[0]) != strlen($str))
return FALSE;
return sprintf("%02X:%02X:%02X:%02X:%02X:%02X", hexdec($arr[1]), hexdec($arr[2]), hexdec($arr[3]), hexdec($arr[4]), hexdec($arr[5]
), hexdec($arr[6]));
}
$testStrings = array("0-1-2-3-4-5","00:a0:e0:15:55:2f","89 78 77 87 88 9a","0098:8832:aa33","bc de f3-00 e0 90","00e090-ee33cc","
bf:55:6e:7t:55:44", "::5c:12::3c","0123456789ab");
foreach($testStrings as $str)
{
$res = ValidateMAC($str);
print("$str => $res<br>");
}
?>
This returns:
0-1-2-3-4-5 => 00:01:02:03:04:05
00:a0:e0:15:55:2f => 00:A0:E0:15:55:2F
89 78 77 87 88 9a => 89:78:77:87:88:9A
0098:8832:aa33 => 00:98:88:32:AA:33
bc de f3-00 e0 90 => BC:DE:F3:00:E0:90
00e090-ee33cc => 00:E0:90:EE:33:CC
bf:55:6e:7t:55:44 =>
::5c:12::3c => 00:00:5C:12:00:3C
0123456789ab => 01:23:45:67:89:AB
tlex at NOSPAM dot psyko dot ro
22-Sep-2005 11:34
To check a Romanian landline phone number, and to return "Bucharest", "Proper" or "Unknown", I've used this function:
<?
function verify_destination($destination) {
$dst_length=strlen($destination);
if ($dst_length=="10"){
if(preg_match("/^021[2-7]{1}[0-9]{6}$/",$destination)) {
$destination_match="Bucharest";
} elseif (preg_match("/^02[3-6]{1}[0-9]{1}[1-7]{1}[0-9]{5}$/",$destination)) {
$destination_match = "Proper";
} else {
$destination_match = "Unknown";
}
}
return ($destination_match);
}
?>
paullomax at gmail dot com
06-Sep-2005 10:01
If you want some email validation that doesn't reject valid emails (which the ones above do), try this code (from http://iamcal.com/publish/articles/php/parsing_email)
function is_valid_email_address($email){
$qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]';
$dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]';
$atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c'.
'\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+';
$quoted_pair = '\\x5c\\x00-\\x7f';
$domain_literal = "\\x5b($dtext|$quoted_pair)*\\x5d";
$quoted_string = "\\x22($qtext|$quoted_pair)*\\x22";
$domain_ref = $atom;
$sub_domain = "($domain_ref|$domain_literal)";
$word = "($atom|$quoted_string)";
$domain = "$sub_domain(\\x2e$sub_domain)*";
$local_part = "$word(\\x2e$word)*";
$addr_spec = "$local_part\\x40$domain";
return preg_match("!^$addr_spec$!", $email) ? 1 : 0;
}
webmaster at swirldrop dot com
26-Jul-2005 11:46
To replace any characters in a string that could be 'dangerous' to put in an HTML/XML file with their numeric entities (e.g. é for [e acute]), you can use the following function:
function htmlnumericentities($str){
return preg_replace('/[^!-%\x27-;=?-~ ]/e', '"&#".ord("$0").chr(59)', $str);
};//EoFn htmlnumericentities
To change any normal entities (e.g. €) to numerical entities call:
$str = htmlnumericalentities(html_entity_decode($str));
hippiejohn1020 --- attT --- yahoo.com
26-Jul-2005 09:38
Watch out when using c-style comments around a preg_match or preg_* for that matter. In certain situations (like example below) the result will not be as expected. This one is of course easy to catch but worth noting.
/*
we will comment out this section
if (preg_match ("/anything.*/", $var)) {
code here;
}
*/
This is (I believe) because comments are interpreted first when parsing the code (and they should be). So in the preg_match the asterisk (*) and the ending delimiter (/) are interpreted as the end of the comment and the rest of your (supposedly commented) code is intrepreted as php.
ian at remove-this dot mecnet dot net
12-Jul-2005 07:26
Linux kernel 2.6.11 changed the format of /proc/net/ip_conntrack. I have updated the regular expression mark@portinc.net created in a comment below so that his function works again.
// Updated this regular expression for kernel 2.6.11 changes to /proc/net/ip_conntrack
$GREP = '!([a-z]+) ' .// [1] protocol
'\\s*([^ ]+) ' .// [2] protocl in decimal
'([^ ]+) ' .// [3] time-to-live
'?([A-Z_]|[^ ]+)?' .// [4] state
' src=(.*?) ' .// [5] source address
'dst=(.*?) ' .// [6] destination address
'sport=(\\d{1,5}) ' .// [7] source port
'dport=(\\d{1,5}) ' .// [8] destination port
'packets=(.*?) ' .// [9] num of packets so far
'bytes=(.*?) ' .// [10] num of bytes so far
'src=(.*?) ' .// [11] reversed source
'dst=(.*?) ' .// [12] reversed destination
'sport=(\\d{1,5}) ' .// [13] reversed source port
'dport=(\\d{1,5}) ' .// [14] reversed destination port
'packets=(.*?) ' .// [15] reversed num of packets so far
'bytes=(.*?) ' .// [16] reversed num of bytes so far
'\\[([^]]+)\\] ' .// [17] status
'mark=(.*?) ' .// [18] marked?
'use=([0-9]+)!'; // [19] use
masterkumon at yahoo dot com
07-Jul-2005 05:49
LITTLE NOTE ON PATTERN FOR NOOBIE :
int preg_match ( string pattern, string subject)
"/^[a-z0-9 ]*$/"
/ =begin/end pattern
^=matching from exactly beginning of subject
$=matching from exactly end of subject
[]=match with any character in the "[]" brackets.
[a-z0-9 ]=match with any character between a to z OR 0 to 9 OR "space" (there is space between 9 and ])
*=the number of matching character in the subject can be 0 or more. Actually "[]"brackets only match for 1 character position so if you to
match 1 or more use "+"
match 0 or more use "*"
match 5 characters use {5}
match 5 to 6 characters use {5,6}
on the "*" position in the example.
<?
preg_match ($pattern, $subject);
?>
$pattern="/^[a-z0-9 ]*$/";
$subject="abcdefgadfafda65" ->TRUE
$subject="abcdefg ad f afda 65" ->TRUE
$pattern="/[a-z0-9 ]*/";
$subject="$$abcdefgadfafda65><" ->TRUE
why? because there's no "^" on the beginning and no "$" on the end of $pattern. So the regex matchs entire $subject and found the correct one on the middle, it is OK because there's no "^" and "$" boundary.
If you put only one either "^" nor "$" the regex will matchs for head nor tail of $subject only.
LITTLE MORE ADVANCE
checking file name string that must contain 3 characters extension ".xxx" and the file name contains alphabetic character only.
here is the pattern "/^[a-zA-Z0-9]*\.[a-zA-Z0-9]{3}$/"
\.=there is "." character for extension separation. Constant character must be preceeded by an "\".
{3}= 3 characters extension
OTHER SPECIAL CHARACTERS. I haven't examined them.
.=wild card for any character
|=OR
Rasqual
04-Jul-2005 09:03
Do not forget PCRE has many compatible features with Perl.
One that is often neglected is the ability to return the matches as an associative array (Perl's hash).
For example, here's a code snippet that will parse a subset of the XML Schema 'duration' datatype:
<?php
$duration_tag = 'PT2M37.5S'; preg_match(
'#^PT(?:(?P<minutes>\d+)M)?(?P<seconds>\d+)(?:\.\d+)?S$#',
$duration_tag,
$matches);
print_r($matches);
?>
Here is the corresponding output:
Array
(
[0] => PT2M37.5S
[minutes] => 2
[1] => 2
[seconds] => 37
[2] => 37
)
i at camerongreen dot org
25-Jun-2005 08:01
The isvalidemail function has any number of things wrong with it, for a start there is a missing ) bracket so it won't compile.
Once you fix that, the delimiters used give me an error, so you need to enclose it in forward slashes. Using word boundaries as delimeters is a bad idea as any string that contained a valid email anywhere in it (along with who knows what else, maybe a CSS attack or SQL injection) would be returned as a valid email.
Moving on it then only accepts emails in uppercase, the author of this expressions email address for instance won't pass his own regular expression.
I don't have time at the moment to look up the appropriate rfc, but until someone puts up a better one here is my email checking function which at least compiles :)
function isValidEmail($email_address) {
$regex = '/^[A-z0-9][\w.-]*@[A-z0-9][\w\-\.]+\.[A-z0-9]{2,6}$/';
return (preg_match($regex, $email_address));
}
Note : It doesn't accept emails with percentage signs (easy to change) and it requires the user id, first subdomain and last subdomain to start with a letter or number.
Cameron Green
MKP dev a.t g! mail d0t com (parseme)
24-Jun-2005 03:30
Domain name parsing is tricky with an RE. The simplest, most efficient method, is probably just to split the domain by . and individually verify each part, like so:
<?php
$email_address = 'tom@rocks.my.socks.tld'; function verify_addr ($address) {
$return = false;
if (preg_match ('/^[\w.]+@([\w.]+)\.[a-z]{2,6}$/i', $address, $domain)) {
$domain = explode ('.', $domain[0]);
foreach ($domain as $part) { if (substr ($part, 0, 1) == '_' || substr ($part, strlen ($part) - 1, 1) == '_')
$return = false; else
$return = true; }
}
return $return;
}
if (verify_addr ($email_address)) {
} else {
}
?>
An alternative would be to look for _. and ._ in the domain section, or to just ignore that restriction entirely and use this RE:
/^[\w.]+@(?:[\w.]{2,63}\.)+[a-z]{2,6}$/i
tom at depman dot com
23-Jun-2005 06:14
There have been several examples of abbreviating strings or an ellipse function like some may call it. Of course I couldn't find any until after I wrote this, so thought I'd share with you. Basically what this does is takes a long string (like a TEXT from MySQL) and shortens it to give a quick display of the text. But instead of chopping it in the middle of the word it looks for a period or a space using preg_match and chops there. Hope this helps someone.
<?php
$MAX_LEN = 50;
$text_to_display = "Connect to a MySQL database or get
some other source for a long string you'd like
to display but don't want to chop the words in half";
function abreviated_text( $text_to_display, $MAX_LEN=30 ){
if ( strlen($text_to_display) > $MAX_LEN ){
preg_match ( "/.* /", substr($text_to_display,0,$MAX_LEN), $found );
$text_to_display_abr = substr("{$found[0]}",0,-1);
}
if ( $text_to_display_abr )
return $text_to_display_abr."...";
else
return $text_to_display;
}
echo abreviated_text($text_to_display,$MAX_LEN);
?>
webmaster at swirldrop dot com
07-Jun-2005 06:05
An imporvement of the regular expression from hackajar <matt> yahoo <trot> com for e-mail addresses is this:
<?php
if(preg_match( '/^[A-Z0-9._-]+@[A-Z0-9][A-Z0-9.-]{0,61}[A-Z0-9]\.[A-Z.]{2,6}$/i' , $data)
) return true;
?>
This stops the domain name starting or ending with a hyphen (or a full stop), and limits the domain to a minimum 2 and a maximum 63 characters. I've also added a full stop in the last character class to allow for 63-character domain names with a country code like .org.uk.
The 63 character limit is just for the bit before the TLD (i.e. only 'php', not '.net'). I think this is right, but I'm not totally sure.
webmaster at m-bread dot com
07-Jun-2005 05:47
If you want to get all the text characters from a string, possibly entered by a user, and filter out all the non alpha-numeric characters (perhaps to make an ID to enter user-submitted details into a database record), then you can use the function below. It returns a string of only the alpha-numeric characters from the input string (all in lower case), with all other chracters removed.
<?php
function removeNonAN($string){
preg_match_all('/(?:([a-z]+)|.)/i', $string, $matches);
return strtolower(implode('', $matches[1]));
};?>
It took me quite a while tocome up with this regular expression. I hope it saves someone else that time.
hackajar <matt> yahoo <trot> com
06-Jun-2005 02:12
In regards to Stony666 email validator:
Per RFC 1035, domain names must be combination of letter, numbers and hyphens. _ and % are not allowed. They should be no less the 2 and no greater then 63 characters. Hypens may not appear at begging or end.
Per RFC 822, email address- when regarding username - % generally not accepted in "new" (ha 1982 'new') format. _ and - are OK (as well as "." but not in subdomain but rather "real" username)
here's something a little better:
if(preg_match('/^[A-Z0-9._-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i', $data)) return true;
Small problem with this thought, I can't wrap my mind around the limit domain name to 2-63 characters, nor how to check for hypens at begging and end. Maybe someone else can toss in a better revision?
stoney666 at gmail dot com
24-May-2005 02:31
Update to my last entry, i noticed that the email validation function didnt actually work like it was supposed to. Here's the working version.
<?php
function validate_email($email_address) {
if (preg_match("/^[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,6}$/i", $email_address)) {
return true; }
else { return false; }
}
?>
Chortos-2
14-May-2005 01:30
max wrote a fix for satch666's function, but it too has a little bug... If you write IP 09.111.111.1, it will return TRUE.
<?
$num="(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])";
if (!preg_match("/^$num\\.$num\\.$num\\.$num$/", $$ip_addr)) echo "Wrong IP Address\\n";
?>
P.S. Why did you write [0-9] and not \\d?
Gaspard
10-May-2005 02:47
If someone need it.. It validates a birth date in format JJMMAAAA
<?php
if (preg_match("/
^(0[1-9]|[1-2]{1}[0-9]{1}|3[0-1]{1})
(0[1-9]{1}|1[0-2]{1})
(19[\d]{2}|200[0-5])$/", $date)
echo "Ok" ;
?>
21-Apr-2005 04:37
If you are using an older version of PHP, you will find that preg_match(",", "foo,bar") works as one might like. However, for newer versions, this needs to be preg_match("/,/", "foobar"). You'll get an odd message about a delimiter if this is the problem.
MikeS
08-Apr-2005 01:34
For anyone that's looking around for info about preg_match crashes on long stings I may have a solution for you. After wasting 2 hours I finally found out it is a bug w/ PCRE and not a problem w/ my input data or regex. In my case I was able to turn on UnGreedy (U modifier) and it worked fine! Before my regex would crash on strings around 1800 chars. With no modification to the regex aside from the ungreeder modifier I ran it on strings up to 500,000 chars long! (not that it crashed at 500K, i just stopped trying to find a limit after that)
Of course this "fix" depends on the nature of regex and what you're trying to do.
Hope this helps someone!
max at clnet dot cz
07-Apr-2005 07:40
satch666 writed fix for the function valid_ipv4(), but it's not working good. I think that this code is realy functionaly.
<?
$num="([0-9]|[0-9]{2}|1\d\d|2[0-4]\d|25[0-5])";
if (!preg_match("/^$num\.$num\.$num\.$num$/", $$ip_addr)) echo "Wrong IP Address\n";
?>
carsten at senseofview dot de
14-Mar-2005 03:57
The ExtractString function does not have a real error, but some disfunction. What if is called like this:
ExtractString($row, 'action="', '"');
It would find 'action="' correctly, but perhaps not the first " after the $start-string. If $row consists of
<form method="post" action="script.php">
strpos($str_lower, $end) would return the first " in the method-attribute. So I made some modifications and it seems to work fine.
function ExtractString($str, $start, $end)
{
$str_low = strtolower($str);
$pos_start = strpos($str_low, $start);
$pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
if ( ($pos_start !== false) && ($pos_end !== false) )
{
$pos1 = $pos_start + strlen($start);
$pos2 = $pos_end - $pos1;
return substr($str, $pos1, $pos2);
}
}
erm(at)the[dash]erm/dot/com
11-Mar-2005 01:15
This is a modified version of the valid_ipv4 function that will test for a valid ip address with wild cards.
ie 192.168.0.*
or even 192.168.*.1
function valid_ipv4($ip_addr)
{
$num="(\*|[0-9]{1,3}|^1?\d\d$|2[0-4]\d|25[0-5])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
print_r ($matches);
return $matches[0];
} else {
return false;
}
}
info at reiner-keller dot de
11-Feb-2005 10:03
Pointing to the post of "internet at sourcelibre dot com": Instead of using PerlRegExp for e.g. german "Umlaute" like
<?php
$bolMatch = preg_match("/^[a-zA-Z]+$/", $strData);
?>
use the setlocal command and the POSIX format like
<?php
setlocale (LC_ALL, 'de_DE');
$bolMatch = preg_match("/^[[:alpha:]]+$/", $strData);
?>
This works for any country related special character set.
Remember since the "Umlaute"-Domains have been released it's almost mandatory to change your RegExp to give those a chance to feed your forms which use "Umlaute"-Domains (e-mail and internet address).
Live can be so easy reading the manual ;-)
mikeblake a.t. akunno d.o.t net
24-Jan-2005 07:38
The author of ExtractString below has made an error (email at albert-martin dot com).
if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false)
Should have been
if (strpos($str_low, $start) !== false && strpos($str_low, $end) !== false)
Note the slight variable name mistake at the second strpos
kalaxy at nospam dot gmail dot com
18-Jan-2005 07:20
This is another way of implimenting array_preg_match. It also shows use of the array_walk() and create_function() functions.
<?php
function array_preg_match($pattern, $subject, $retainkey = false){
$matches = ''; array_walk($subject,
create_function('$val, $key, $array',
'if (preg_match("' . $pattern . '", "$val")) $array['. ($retainkey ? '$key':'') .'] = $val;'),
&$matches);
return $matches;
}
?>
kalon mills
hfuecks at phppatterns dot com
13-Jan-2005 05:11
Note that the PREG_OFFSET_CAPTURE flag, as far as I've tested, returns the offset in bytes not characters, which may not be what you're expecting if you're using the /u pattern modifier to make the regex UTF-8 aware (i.e. multibyte characters will result in a greater offset than you expect)
29-Dec-2004 12:44
This is a constant that helps in getting a valid phone number that does not need to be in a particular format. The following is a constant that matches the following US Phone formats:
Phone number can be in many variations of the following:
(Xxx) Xxx-Xxxx
(Xxx) Xxx Xxxx
Xxx Xxx Xxxx
Xxx-Xxx-Xxxx
XxxXxxXxxx
Xxx.Xxx.Xxxx
define( "REGEXP_PHONE", "/^(\(|){1}[2-9][0-9]{2}(\)|){1}([\.- ]|)[2-9][0-9]{2}([\.- ]|)[0-9]{4}$/" );
carboffin at msn dot com
23-Dec-2004 07:54
Heres just some quick code intended to be used in validating url vars or input strings.
<?php
if(preg_match("/^[a-z0-9]/i", $file)){
}
?>
satch666 at dot nospam dot hotmail dot com
17-Dec-2004 08:53
what a lapsus! where i said 'subpattern' at my post below, replace such by 'type of number' or by 'case';
satch666 at dot nospam dot hotmail dot com
17-Dec-2004 08:44
some fix for the function valid_ipv4() proposed by selt:
if trying, for example this wrong IP: 257.255.34.6, it is got as valid IP, getting as result: 57.255.34.6
the first subpattern of numbers defined at pattern matches with '257', because '57' is a valid string for '1?\d\d' pattern; this happens because it is not added there some logic for the string limits ...;
i have tried using '^1?\d\d$', and it works, as we are saying in plain english: if the string has 3 chars, then it is starting by '1' digit and followed by other 2, ending the string there; if it has 2 chars, then both are any digit; any other case out of this 2 doesnt match the pattern; in other words, it is defined the subrange of numbers from '10' to '199'
so the function would get as this (after modifying pattern and erasing a var, called $range, not used at function):
<?
function valid_ipv4($ip_addr)
{
$num="([0-9]|^1?\d\d$|2[0-4]\d|25[0-5])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
return $matches[0];
} else {
return false;
}
}
?>
internet at sourcelibre dot com
03-Dec-2004 08:34
This helped me to make a mask for all french characters. Just modify the $str in ordre to find your mask.
<pre>
<?php
$str = "";
$strlen = strlen($str);
$array = array();
$mask = "/^[a-zA-Z";
for ($i = 0; $i < $strlen; $i++) {
$char = $str{$i};
$hexa = dechex(ord($char));
echo htmlentities($char)." = ". $hexa . "\n";
$array[$i] = $hexa;
$mask .= '\\x' . $hexa;
}
$mask .= " ]+$/";
echo $mask;
?>
</pre>
zubfatal, root at it dot dk
25-Nov-2004 05:56
<?php
function array_preg_match($strRegEx = "", $arrHaystack = NULL, $boolNewArray = 0, $boolMatchesOnly = 0) {
if (strlen($strRegEx) < 1) {
return "ERR: \$strRegEx argument is missing.";
}
elseif ((!is_array($arrHaystack)) || (!count($arrHaystack) > 0)) {
return "ERR: \$arrHaystack is empty, or not an array.";
}
else {
unset($arrTmp);
foreach($arrHaystack as $key => $value) {
if ($boolMatchesOnly) {
if (preg_match_all($strRegEx, $value, $tmpRes)) {
$arrTmp[] = $tmpRes;
}
}
else {
if (preg_match($strRegEx, $value, $tmpRes)) {
if ($boolNewArray) { $arrTmp[] = $value; }
else { $arrTmp[$key] = $value; }
}
}
}
return $arrTmp;
}
}
?>
// zubfatal
email at albert-martin dot com
23-Oct-2004 02:39
Here is a faster way of extracting a special phrase from a HTML page:
Instead of using preg_match, e.g. like this:
preg_match("/<title>(.*)<\/title>/i", $html_content, $match);
use the following:
<?php
function ExtractString($str, $start, $end) {
$str_low = strtolower($str);
if (strpos($str_low, $start) !== false && strpos($str_lower, $end) !== false) {
$pos1 = strpos($str_low, $start) + strlen($start);
$pos2 = strpos($str_low, $end) - $pos1;
return substr($str, $pos1, $pos2);
}
}
$match = ExtractString($html_content, "<title>", "</title>");
?>
j dot gizmo at aon dot at
09-Oct-2004 06:00
in reply to rchoudhury --} pinkgreetings {-- com....
the code pasted below (with the switch statement) CANNOT work.
the construct works like this
<?php
switch ($key)
{
case <expr>:
echo "1";
break;
}
switch (true)
{
case preg_match("/pattern/",$key):
blablablabla();
break;
}
?>
however, it makes no sense to compare $key to the return value of preg_match(), and calling preg_match without a second parameter is utterly senseless as well (PHP can't smell what you want to compare pattern to)
the syntax error in your regular expression is the double slash in the beginning.
(RTFM)
rchoudhury --} pinkgreetings {-- com
17-Aug-2004 09:57
I was looking for an easy way to match multiple conditions inside a switch, and preg_match() seemed like a straightforward solution:
<?php
foreach (func_get_arg(0) as $key => $value) {
switch ($key) {
case preg_match("//^(meta_keywords | meta_desc | doctype | xmlns | lang | dir | charset)$/"):
$this->g_page_vars[$key] = $value;
break 1;
case preg_match("//^(site_title|site_desc|site_css)$/"):
$this->g_page_vars[$key] = $g_site_vars[$key];
break 1;
}
}
?>
However, while it seemed to work on one server using php 4.3.8, where it accepted only one argument (pattern) and assumed the second one (subject) to be $key, another server running 4.3.8 breaks and returns an obvious warning of "Warning: preg_match() expects at least 2 parameters, 1 given".
You probably think "why not just give preg_match a second argument then?" -- well, if we were to do that it'd be $key in this context, but that returns this error: "Warning: Unknown modifier '^'". So now the regex is bad?
One possible solution may lie in php.ini settings, though since I don't have access to that file on either server I can't check and find out.
http://www.phpbuilder.com/lists/php-developer-list/2003101/0201.php has some comments and other suggestions for the same concept, namely in using:
<?php
switch(true) {
case preg_match("/regex/",$data):
}
?>
...but this doesn't address the current single argument problem.
Either way, it's a useful way of working a switch, but it might not work.
ebiven
06-Jul-2004 01:53
To regex a North American phone number you can assume NxxNxxXXXX, where N = 2 through 9 and x = 0 through 9. North American numbers can not start with a 0 or a 1 in either the Area Code or the Office Code. So, adpated from the other phone number regex here you would get:
/^[2-9][0-9]{2}[-][2-9][0-9]{2}[-][0-9]{4}$/
05-May-2004 07:23
A very simple Phone number validation function.
Returns the Phone number if the number is in the xxx-xxx-xxxx format. x being 0-9.
Returns false if missing digits or improper characters are included.
<?
function VALIDATE_USPHONE($phonenumber)
{
if ( (preg_match("/^[0-9]{3,3}[-]{1,1}[0-9]{3,3}[-]{1,1}
[0-9]{4,4}$/", $phonenumber) ) == TRUE ) {
return $phonenumber;
} else {
return false;
}
}
?>
selt
10-Feb-2004 03:11
Concerning a list of notes started on November 11; ie
<?
$num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";
?>
It is interesting to note that the pattern matching is done using precedence from left to right, therefore; an address such as 127.0.0.127 sent to preg_match with a hash for the matched patterns would return 127.0.0.1.
so, to obtain a proper mechanism for stripping valid IPs from a string (any string that is) one would have to use:
<?
function valid_ipv4($ip_addr)
{
$num="(1?\d\d|2[0-4]\d|25[0-5]|[0-9])";
$range="([1-9]|1\d|2\d|3[0-2])";
if(preg_match("/$num\.$num\.$num\.$num/",$ip_addr,$matches))
{
return $matches[0];
} else {
return false;
}
}
?>
thanks for all the postings ! They're the best way to learn.
mark at portinc dot net
02-Feb-2004 06:30
<?php $iptables = file ('/proc/net/ip_conntrack');
$services = file ('/etc/services');
$GREP = '!([a-z]+) ' .'\\s*([^ ]+) ' .'([^ ]+) ' .'?([A-Z_]|[^ ]+)?'.' src=(.*?) ' .'dst=(.*?) ' .'sport=(\\d{1,5}) '.'dport=(\\d{1,5}) '.'src=(.*?) ' .'dst=(.*?) ' .'sport=(\\d{1,5}) '.'dport=(\\d{1,5}) '.'\\[([^]]+)\\] ' .'use=([0-9]+)!'; $ports = array();
foreach($services as $s) {
if (preg_match ("/^([a-zA-Z-]+)\\s*([0-9]{1,5})\\//",$s,$x)) {
$ports[ $x[2] ] = $x[1];
} }
for($i=0;$i <= count($iptables);$i++) {
if ( preg_match ($GREP, $iptables[$i], $x) ) {
$x[7] =(array_key_exists($x[7],$ports))?$ports[$x[7]]:$x[7];
$x[8] =(array_key_exists($x[8],$ports))?$ports[$x[8]]:$x[8];
print_r($x);
} }
?>
nico at kamensek dot de
17-Jan-2004 11:31
As I did not find any working IPv6 Regexp, I just created one. Here is it:
$pattern1 = '([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}';
$pattern2 = '[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}';
$pattern3 = '([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}';
$pattern4 = '([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}';
$pattern5 = '([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}';
$pattern6 = '([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}';
$pattern7 = '([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}';
patterns 1 to 7 represent different cases. $full is the complete pattern which should work for all correct IPv6 addresses.
$full = "/^($pattern1)$|^($pattern2)$|^($pattern3)$
|^($pattern4)$|^($pattern5)$|^($pattern6)$|^($pattern7)$/";
brion at pobox dot com
30-Nov-2003 07:35
Some patterns may cause the PCRE functions to crash PHP, particularly when dealing with relatively large amounts of input data.
See the 'LIMITATIONS' section of http://www.pcre.org/pcre.txt about this and other limitations.
thivierr at telus dot net
23-Nov-2003 01:23
A web server log record can be parsed as follows:
$line_in = '209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"';
if (preg_match('!^([^ ]+) ([^ ]+) ([^ ]+) \[([^\]]+)\] "([^ ]+) ([^ ]+) ([^/]+)/([^"]+)" ([^ ]+) ([^ ]+) ([^ ]+) (.+)!',
$line_in,
$elements))
{
print_r($elements);
}
Array
(
[0] => 209.6.145.47 - - [22/Nov/2003:19:02:30 -0500] "GET /dir/doc.htm HTTP/1.0" 200 6776 "http://search.yahoo.com/search?p=key+words=UTF-8" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
[1] => 209.6.145.47
[2] => -
[3] => -
[4] => 22/Nov/2003:19:02:30 -0500
[5] => GET
[6] => /dir/doc.htm
[7] => HTTP
[8] => 1.0
[9] => 200
[10] => 6776
[11] => "http://search.yahoo.com/search?p=key+words=UTF-8"
[12] => "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
)
Notes:
1) For the referer field ($elements[11]), I intentially capture the double quotes (") and don't use them as delimiters, because sometimes double-quotes do appear in a referer URL. Double quotes can appear as %22 or \". Both have to be handled correctly. So, I strip off the double quotes in a second step.
2) The URLs should be further parsed, using parse_url, which is quicker and more reliable then preg_match.
3) I assume the requested protocol (HTTP/1.1) always has a slash character in the middle, which might not always be the case, but I'll take the risk.
4) The agent field ($elments[12]) is the most unstructured field, so I make no assumptions about it's format. If the record is truncated, the agent field will not be delimited properly with a quote at the end. So, both cases must be handled.
5) A hyphen (- or "-") means a field has no value. It is necessary to convert these to appropriate value (such as empty string, null, or 0).
6) Finally, there should be appropriate code to handle malformed web log enteries, which are common, due to junk data. I never assume I've seen all cases.
nospam at 1111-internet dot com
11-Nov-2003 12:29
Backreferences (ala preg_replace) work within the search string if you use the backslash syntax. Consider:
<?php
if (preg_match("/([0-9])(.*?)(\\1)/", "01231234", $match))
{
print_r($match);
}
?>
Result: Array ( [0] => 1231 [1] => 1 [2] => 23 [3] => 1 )
This is alluded to in the description of preg_match_all, but worth reiterating here.
bjorn at kulturkonsult dot no
31-Mar-2003 05:56
I you want to match all scandinavian characters () in addition to those matched by \w, you might want to use this regexp:
/^[\w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4]+$/
Remember that \w respects the current locale used in PCRE's character tables.
| |