Am I right in thinking that you've got a chunck of text in a string, and you want to send that chunk through a regex to highlight URLs and automatically place a around them, right?
What programming language are you using?
- wil
Mark Hensler posted this at 18:17 — 17th April 2002.
Quote: Originally posted by korndragon thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.
No, that's very true. I picked a rather sketchy module there, actually.
The module you would need to do all the things in the above post would be HTML::TokenParser which I'm not familiar with.
Cheers
- wil
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.
Wil posted this at 08:36 — 17th April 2002.
They have: 601 posts
Joined: Nov 2001
Sorry, not sure if I understand your question.
Am I right in thinking that you've got a chunck of text in a string, and you want to send that chunk through a regex to highlight URLs and automatically place a around them, right?
What programming language are you using?
- wil
Mark Hensler posted this at 18:17 — 17th April 2002.
He has: 4,048 posts
Joined: Aug 2000
If you want PHP, here is a snippet from a class I wrote about a year ago... I don't remember how the class works anymore.
function BBenCode($string) {
# This function escapes HTML, and encodes BBcode, and puts back allowed HTML
if ($string == "") {
Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
}
else {
/** HTML **/
if ($this->entry_ID != "new") {
//this IF statement excludes entries
// escape HTML... naughty visitors!
$string = htmlentities($string);
// strip slashes
$string = stripslashes($string);
// BOLD
$string = preg_replace("#<b>(.*)</b>#U","<b>\\1</b>",$string);
// ITALIC
$string = preg_replace("#<i>(.*)</i>#U","<i>\\1</i>",$string);
// UNDERLINE
$string = preg_replace("#<u>(.*)</u>#U","<u>\\1</u>",$string);
// HYPERLINKS (EMAIL) <a href="http://xxx">yyy</a>
$string = preg_replace("#<a href="(ftp://|https://|http://|mailto:)(.*)">(.*)</a>#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS (EMAIL) <a href="xxx">yyy</a>
$string = preg_replace("#<a href="(.*)">(.*)</a>#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS (EMAIL) <a href=http://xxx>yyy</a>
$string = preg_replace("#<a href=(ftp://|https://|http://|mailto:)(.*)>(.*)</a>#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS (EMAIL) <a href=xxx>yyy</a>
$string = preg_replace("#<a href=(.*)>(.*)</a>#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
}
/** JCODE **/
// BOLD
$string = preg_replace("/\[b\](.*)\[\/b\]/U","<b>\\1</b>",$string);
// ITALIC
$string = preg_replace("/\[i\](.*)\[\/i\]/U","<i>\\1</i>",$string);
// UNDERLINE
$string = preg_replace("/\[u\](.*)\[\/u\]/U","<u>\\1</u>",$string);
// HYPERLINKS <a href="http://xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url="(ftp://|https://|http://|mailto:)(.*)"\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS <a href="xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url="(.*)"\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS <a href="http://xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=(ftp://|https://|http://|mailto:)(.*)\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS <a href="xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=(.*)\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS <a href="http://xxx" class="bb-url">http://xxx</a>
$string = preg_replace("#\[url\](ftp://|https://|http://|mailto:)(.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\2</a>",$string);
// HYPERLINKS <a href="xxx" class="bb-url">xxx</a>
$string = preg_replace("#\[url\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\1</a>",$string);
// EMAIL <a href="mailto:mailto" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:"mailto:(.*)?"\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL [email=xxx" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:"(.*)?"\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL [email=mailto" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:mailto:(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL [email=xxx" class="bb-email">yyy</a>
$string = preg_replace("#\[email=(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL <a href="mailto:xxx" class="bb-email">xxx</a>
$string = preg_replace("#\[email\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\1</a>",$string);
/** LINE BREAKS **/
// \n to <BR>
$string = nl2br($string);
// re-insert slashes
$string = addslashes($string);
return $string;
} #END var check
} #END BBenCode()
function BBdeCode($string) {
# This function reverts to BBcode
if ($string == "") {
Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
}
else {
// <BR> to \n
$string = eregi_replace("<BR>\n","\n",$string);
// escape HTML... naughty visitors!
$trans = get_html_translation_table(HTML_ENTITIES);
$trans[" "] = " ";
$trans = array_flip ($trans);
$string = strtr($string, $trans);
// put back BOLD
$string = preg_replace("/<b>(.*)<\/b>/U","<b>\\1</b>",$string);
// put back ITALIC
$string = preg_replace("/<i>(.*)<\/i>/U","<i>\\1</i>",$string);
// put back UNDERLINE
$string = preg_replace("/<u>(.*)<\/u>/U","<u>\\1</u>",$string);
// HYPERLINKS (EMAIL)
$string = preg_replace("#<a href=\"(ftp://|https://|http://|mailto:)(.*)\" target=_blank>(.*)</a>#U","<a href=\"\\1\\2\">\\3</a>",$string);
return $string;
} #END var check
} #END BBdeCode()
Mark Hensler
If there is no answer on Google, then there is no question.
openmind posted this at 21:01 — 17th April 2002.
He has: 945 posts
Joined: Aug 2001
Why does PHP complicate things so!
Here's how I would do it in ColdFusion:
<cfset MSGBody = "#Replace(MSGBody, '<', '<','ALL')#">
<cfset MSGBody = "#Replace(MSGBody, '>', '>','ALL')#">
<!---Auto link creation--->
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(ht|f)(tps?://[A-Za-z0-9])([^[[:space:]]*)", '\1<a href="\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(www\.)([A-Za-z:]*)([^[[:space:]]*)", '\1<a href="http://\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])([A-Za-z0-9_\.]+@+[A-Za-z0-9]+\.+[A-Za-z0-9]+?\.?[A-Za-z0-9]*)", '\1<a href="mailto:\2" class="body">\2</a>', "all")>
<!---manul link creation--->
<!--- Links --->
<cfloop condition="#findnocase('[url=',MSGBody)# GT 0 AND #findnocase('[/url]',MSGBody)#">
<CFSET URLString = "#MSGBody#">
<CFSET StartPos = FindNoCase("[url=",urlString)>
<CFSET EndOfURL = FindNoCase("]",urlString,StartPos)>
<CFSET EndPos = FindNoCase("[/url",urlString)>
<CFSET WebsiteURL = Mid(urlString,StartPos + 5, EndofURL - (StartPos + 5))>
<CFSET WebsiteName = Mid(urlString, EndOfURL + 1, (EndPos - EndOfURL) -1)>
<cfif isdefined("url.keywords") or findnocase("[", websitename) or findnocase("]", websiteurl)>
<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '#WebsiteURL#')#">
<cfelse>
<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '<a href=''#WebsiteURL#'' target=blank class=body>#WebsiteName#</a>')#">
</cfif>
</cfloop>
<!--- Email Address --->
<cfloop condition="#findnocase('[email=',MSGBody)# GT 0 AND #findnocase('[/email]',MSGBody)# GT 0">
<CFSET URLString = "#MSGBody#">
<CFSET StartPos = FindNoCase("[email=",urlString)>
<CFSET EndOfAdd = FindNoCase("]",urlString,StartPos)>
<CFSET EndPos = FindNoCase("[/email",urlString)>
<CFSET EmailAdd = Mid(urlString,StartPos + 7, EndofAdd - (StartPos + 7))>
<CFSET Name = Mid(urlString, EndOfAdd + 1, (EndPos - EndofAdd) -1)>
<cfif isdefined("url.keywords")>
<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '#EmailAdd#')#">
<cfelse>
<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '<a href=mailto:#EmailAdd# class="body">#Name#</a>')#">
</cfif>
</cfloop>
Much simpler methinks!
korndragon posted this at 21:21 — 17th April 2002.
They have: 87 posts
Joined: Dec 2001
thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.
but thanx for you time
Wil posted this at 08:32 — 18th April 2002.
They have: 601 posts
Joined: Nov 2001
And even easier in Perl
use HTML::Parser ();
# Create parser object
$p = HTML::Parser->new( api_version => 3,
start_h => [\&start, "tagname, attr"],
end_h => [\&end, "tagname"],
marked_sections => 1,
);
# Parse document text chunk by chunk
$p->parse($chunk1);
$p->parse($chunk2);
#...
$p->eof; # signal end of document
# Parse directly from file
$p->parse_file("foo.html");
# or
open(F, "foo.html") || die;
$p->parse_file(*F);
# USAGE
my $p = MyParser->new;
$p->parse_file("foo.html");
- wil
Abhishek Reddy posted this at 11:43 — 18th April 2002.
He has: 3,348 posts
Joined: Jul 2001
Could you post the solution you found?
Thanks.
openmind posted this at 19:21 — 18th April 2002.
He has: 945 posts
Joined: Aug 2001
K, perl wins this battle but the war is long from won!!!!
Mark Hensler posted this at 21:44 — 18th April 2002.
He has: 4,048 posts
Joined: Aug 2000
Yeah, but that snippet doesn't do everything the other two do.
openmind posted this at 21:50 — 18th April 2002.
He has: 945 posts
Joined: Aug 2001
Good point, well made even though its from a PHP master!
Wil posted this at 08:29 — 19th April 2002.
They have: 601 posts
Joined: Nov 2001
No, that's very true. I picked a rather sketchy module there, actually.
The module you would need to do all the things in the above post would be HTML::TokenParser which I'm not familiar with.
Cheers
- wil
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.