parsing a url

They have: 87 posts

Joined: Dec 2001

I need to find out how to parse a url into a link for my forum.
like v-bulletin does when you type in a url?

They have: 601 posts

Joined: Nov 2001

Sorry, not sure if I understand your question.

Am I right in thinking that you've got a chunck of text in a string, and you want to send that chunk through a regex to highlight URLs and automatically place a around them, right?

What programming language are you using?

- wil

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

If you want PHP, here is a snippet from a class I wrote about a year ago... I don't remember how the class works anymore.

function BBenCode($string) {
# This function escapes HTML, and encodes BBcode, and puts back allowed HTML

if ($string == "") {
Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
}
else {

/** HTML **/

if ($this->entry_ID != "new") {
//this IF statement excludes entries

// escape HTML... naughty visitors!
$string = htmlentities($string);

// strip slashes
$string = stripslashes($string);

// BOLD
$string = preg_replace("#&lt;b&gt;(.*)&lt;/b&gt;#U","<b>\\1</b>",$string);

// ITALIC
$string = preg_replace("#&lt;i&gt;(.*)&lt;/i&gt;#U","<i>\\1</i>",$string);

// UNDERLINE
$string = preg_replace("#&lt;u&gt;(.*)&lt;/u&gt;#U","<u>\\1</u>",$string);

// HYPERLINKS (EMAIL)  <a href="http://xxx">yyy</a>
$string = preg_replace("#&lt;a href=&quot;(ftp://|https://|http://|mailto:)(.*)&quot;&gt;(.*)&lt;/a&gt;#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS (EMAIL)  <a href="xxx">yyy</a>
$string = preg_replace("#&lt;a href=&quot;(.*)&quot;&gt;(.*)&lt;/a&gt;#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS (EMAIL)  <a href=http://xxx>yyy</a>
$string = preg_replace("#&lt;a href=(ftp://|https://|http://|mailto:)(.*)&gt;(.*)&lt;/a&gt;#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS (EMAIL)  <a href=xxx>yyy</a>
$string = preg_replace("#&lt;a href=(.*)&gt;(.*)&lt;/a&gt;#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
}

/** JCODE **/

// BOLD
$string = preg_replace("/\[b\](.*)\[\/b\]/U","<b>\\1</b>",$string);

// ITALIC
$string = preg_replace("/\[i\](.*)\[\/i\]/U","<i>\\1</i>",$string);

// UNDERLINE
$string = preg_replace("/\[u\](.*)\[\/u\]/U","<u>\\1</u>",$string);

// HYPERLINKS  <a href="http://xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=&quot;(ftp://|https://|http://|mailto:)(.*)&quot;\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS  <a href="xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=&quot;(.*)&quot;\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS  <a href="http://xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=(ftp://|https://|http://|mailto:)(.*)\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
// HYPERLINKS  <a href="xxx" class="bb-url">yyy</a>
$string = preg_replace("#\[url=(.*)\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
// HYPERLINKS  <a href="http://xxx" class="bb-url">http://xxx</a>
$string = preg_replace("#\[url\](ftp://|https://|http://|mailto:)(.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\2</a>",$string);
// HYPERLINKS  <a href="xxx" class="bb-url">xxx</a>
$string = preg_replace("#\[url\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\1</a>",$string);

// EMAIL  <a href="mailto:mailto" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:&quot;mailto:(.*)?&quot;\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL  [email=xxx" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:&quot;(.*)?&quot;\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL  [email=mailto" class="bb-email">yyy</a>
$string = preg_replace("#\<a href="mailto:mailto:(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL  [email=xxx" class="bb-email">yyy</a>
$string = preg_replace("#\[email=(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
// EMAIL  <a href="mailto:xxx" class="bb-email">xxx</a>
$string = preg_replace("#\[email\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\1</a>",$string);


/** LINE BREAKS **/

// \n to <BR>
$string = nl2br($string);

// re-insert slashes
$string = addslashes($string);

return $string;
} #END var check
} #END BBenCode()


function BBdeCode($string) {
# This function reverts to BBcode

if ($string == "") {
Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
}
else {
// <BR> to \n
$string = eregi_replace("<BR>\n","\n",$string);

// escape HTML... naughty visitors!
$trans = get_html_translation_table(HTML_ENTITIES);
$trans[" "] = "&nbsp";
$trans = array_flip ($trans);
$string = strtr($string, $trans);

// put back BOLD
$string = preg_replace("/<b>(.*)<\/b>/U","&lt;b&gt;\\1&lt;/b&gt;",$string);

// put back ITALIC
$string = preg_replace("/<i>(.*)<\/i>/U","&lt;i&gt;\\1&lt;/i&gt;",$string);

// put back UNDERLINE
$string = preg_replace("/<u>(.*)<\/u>/U","&lt;u&gt;\\1&lt;/u&gt;",$string);

// HYPERLINKS (EMAIL)
$string = preg_replace("#<a href=\"(ftp://|https://|http://|mailto:)(.*)\" target=_blank>(.*)</a>#U","<a href=\"\\1\\2\">\\3</a>",$string);


return $string;
} #END var check
} #END BBdeCode()
'vB is jacking the code, so click quote and copy & paste it from the textarea.

Mark Hensler
If there is no answer on Google, then there is no question.

openmind's picture

He has: 945 posts

Joined: Aug 2001

Why does PHP complicate things so! Smiling

Here's how I would do it in ColdFusion:

<cfset MSGBody = "#Replace(MSGBody, '<', '&lt;','ALL')#">
<cfset MSGBody = "#Replace(MSGBody, '>', '&gt;','ALL')#">

<!---Auto link creation--->
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(ht|f)(tps?://[A-Za-z0-9])([^[[:space:]]*)", '\1<a href="\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(www\.)([A-Za-z:]*)([^[[:space:]]*)", '\1<a href="http://\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])([A-Za-z0-9_\.]+@+[A-Za-z0-9]+\.+[A-Za-z0-9]+?\.?[A-Za-z0-9]*)", '\1<a href="mailto:\2" class="body">\2</a>', "all")>

<!---manul link creation--->
<!--- Links --->
<cfloop condition="#findnocase('[url=',MSGBody)# GT 0 AND #findnocase('[/url]',MSGBody)#">
<CFSET URLString = "#MSGBody#">
<CFSET StartPos = FindNoCase("[url=",urlString)>
<CFSET EndOfURL = FindNoCase("]",urlString,StartPos)>
<CFSET EndPos = FindNoCase("[/url",urlString)>
<CFSET WebsiteURL = Mid(urlString,StartPos + 5, EndofURL - (StartPos + 5))>
<CFSET WebsiteName = Mid(urlString, EndOfURL + 1, (EndPos - EndOfURL) -1)>
<cfif  isdefined("url.keywords") or findnocase("[", websitename) or findnocase("]", websiteurl)>
<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '#WebsiteURL#')#">
<cfelse>
<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '<a href=''#WebsiteURL#'' target=blank class=body>#WebsiteName#</a>')#">
</cfif>
</cfloop>

<!--- Email Address --->
<cfloop condition="#findnocase('[email=',MSGBody)# GT 0 AND #findnocase('[/email]',MSGBody)# GT 0">
<CFSET URLString = "#MSGBody#">
<CFSET StartPos = FindNoCase("[email=",urlString)>
<CFSET EndOfAdd = FindNoCase("]",urlString,StartPos)>
<CFSET EndPos = FindNoCase("[/email",urlString)>
<CFSET EmailAdd = Mid(urlString,StartPos + 7, EndofAdd - (StartPos + 7))>
<CFSET Name = Mid(urlString, EndOfAdd + 1, (EndPos - EndofAdd) -1)>
<cfif isdefined("url.keywords")>
<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '#EmailAdd#')#">
<cfelse>
<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '<a href=mailto:#EmailAdd# class="body">#Name#</a>')#">
</cfif>

</cfloop>
'

Much simpler methinks!

They have: 87 posts

Joined: Dec 2001

thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.

but thanx for you time Smiling

They have: 601 posts

Joined: Nov 2001

And even easier in Perl Smiling

use HTML::Parser ();


# Create parser object
$p = HTML::Parser->new( api_version => 3,
                         start_h => [\&start, "tagname, attr"],
                         end_h   => [\&end,   "tagname"],
                         marked_sections => 1,
                       );


# Parse document text chunk by chunk
$p->parse($chunk1);
$p->parse($chunk2);
#...
$p->eof;                 # signal end of document


# Parse directly from file
$p->parse_file("foo.html");
# or
open(F, "foo.html") || die;
$p->parse_file(*F);

# USAGE

my $p = MyParser->new;
$p->parse_file("foo.html");
'

- wil

Abhishek Reddy's picture

He has: 3,348 posts

Joined: Jul 2001

Quote: Originally posted by korndragon
thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.

but thanx for you time Smiling

Could you post the solution you found? Smiling

Thanks.

openmind's picture

He has: 945 posts

Joined: Aug 2001

Quote: Originally posted by Wil
And even easier in Perl Smiling

K, perl wins this battle but the war is long from won!!!! Smiling

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Yeah, but that snippet doesn't do everything the other two do. Sticking out tongue

openmind's picture

He has: 945 posts

Joined: Aug 2001

Good point, well made even though its from a PHP master! Wink

They have: 601 posts

Joined: Nov 2001

No, that's very true. I picked a rather sketchy module there, actually.

The module you would need to do all the things in the above post would be HTML::TokenParser which I'm not familiar with.

Cheers

- wil

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.