parsing a url

korndragon posted this at 22:11 — 16th April 2002.

Joined: Dec 2001

I need to find out how to parse a url into a link for my forum.
like v-bulletin does when you type in a url?

Wil posted this at 08:36 — 17th April 2002.

Joined: Nov 2001

Sorry, not sure if I understand your question.

Am I right in thinking that you've got a chunck of text in a string, and you want to send that chunk through a regex to highlight URLs and automatically place a around them, right?

What programming language are you using?

- wil

Mark Hensler posted this at 18:17 — 17th April 2002.

He has: 4,048 posts

Joined: Aug 2000

If you want PHP, here is a snippet from a class I wrote about a year ago... I don't remember how the class works anymore.

function BBenCode($string) {
	# This function escapes HTML, and encodes BBcode, and puts back allowed HTML
	
	if ($string == "") {
		Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
	}
	else {	
	
		/** HTML **/
		
		if ($this->entry_ID != "new") {
			//this IF statement excludes entries
			
			// escape HTML... naughty visitors!
				$string = htmlentities($string);
			
			// strip slashes
				$string = stripslashes($string);
	
			// BOLD
				$string = preg_replace("#&lt;b&gt;(.*)&lt;/b&gt;#U","<b>\\1</b>",$string);
	
			// ITALIC
				$string = preg_replace("#&lt;i&gt;(.*)&lt;/i&gt;#U","<i>\\1</i>",$string);
		
			// UNDERLINE
				$string = preg_replace("#&lt;u&gt;(.*)&lt;/u&gt;#U","<u>\\1</u>",$string);
		
			// HYPERLINKS (EMAIL)  <a href="http://xxx">yyy</a>
				$string = preg_replace("#&lt;a href=&quot;(ftp://|https://|http://|mailto:)(.*)&quot;&gt;(.*)&lt;/a&gt;#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
			// HYPERLINKS (EMAIL)  <a href="xxx">yyy</a>
				$string = preg_replace("#&lt;a href=&quot;(.*)&quot;&gt;(.*)&lt;/a&gt;#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
			// HYPERLINKS (EMAIL)  <a href=http://xxx>yyy</a>
				$string = preg_replace("#&lt;a href=(ftp://|https://|http://|mailto:)(.*)&gt;(.*)&lt;/a&gt;#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
			// HYPERLINKS (EMAIL)  <a href=xxx>yyy</a>
				$string = preg_replace("#&lt;a href=(.*)&gt;(.*)&lt;/a&gt;#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
		}
		
		/** JCODE **/
		
		// BOLD
			$string = preg_replace("/\[b\](.*)\[\/b\]/U","<b>\\1</b>",$string);
		
		// ITALIC
			$string = preg_replace("/\[i\](.*)\[\/i\]/U","<i>\\1</i>",$string);
		
		// UNDERLINE
			$string = preg_replace("/\[u\](.*)\[\/u\]/U","<u>\\1</u>",$string);
		
		// HYPERLINKS  <a href="http://xxx" class="bb-url">yyy</a>
			$string = preg_replace("#\[url=&quot;(ftp://|https://|http://|mailto:)(.*)&quot;\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
		// HYPERLINKS  <a href="xxx" class="bb-url">yyy</a>
			$string = preg_replace("#\[url=&quot;(.*)&quot;\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
		// HYPERLINKS  <a href="http://xxx" class="bb-url">yyy</a>
			$string = preg_replace("#\[url=(ftp://|https://|http://|mailto:)(.*)\](.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\3</a>",$string);
		// HYPERLINKS  <a href="xxx" class="bb-url">yyy</a>
			$string = preg_replace("#\[url=(.*)\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\2</a>",$string);
		// HYPERLINKS  <a href="http://xxx" class="bb-url">http://xxx</a>
			$string = preg_replace("#\[url\](ftp://|https://|http://|mailto:)(.*)\[/url\]#U","<a href=\"\\1\\2\" target=_blank>\\2</a>",$string);
		// HYPERLINKS  <a href="xxx" class="bb-url">xxx</a>
			$string = preg_replace("#\[url\](.*)\[/url\]#U","<a href=\"http://\\1\" target=_blank>\\1</a>",$string);
		
		// EMAIL  <a href="mailto:mailto" class="bb-email">yyy</a>
			$string = preg_replace("#\<a href="mailto:&quot;mailto:(.*)?&quot;\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
		// EMAIL  [email=xxx" class="bb-email">yyy</a>
			$string = preg_replace("#\<a href="mailto:&quot;(.*)?&quot;\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
		// EMAIL  [email=mailto" class="bb-email">yyy</a>
			$string = preg_replace("#\<a href="mailto:mailto:(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
		// EMAIL  [email=xxx" class="bb-email">yyy</a>
			$string = preg_replace("#\[email=(.*)?\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\2</a>",$string);
		// EMAIL  <a href="mailto:xxx" class="bb-email">xxx</a>
			$string = preg_replace("#\[email\](.*)?\[/email\]#U","<a href=\"mailto:\\1\" target=_blank>\\1</a>",$string);
		
		
		/** LINE BREAKS **/
		
		// \n to <BR>
		$string = nl2br($string);
		
		// re-insert slashes
		$string = addslashes($string);
		
		return $string;
	} #END var check
} #END BBenCode()


function BBdeCode($string) {
	# This function reverts to BBcode
	
	if ($string == "") {
		Journal::Print_Error(__LINE__,"Missing message. Cannot parse missing message for BBcode!");
	}
	else {	
		// <BR> to \n
		$string = eregi_replace("<BR>\n","\n",$string);
	
		// escape HTML... naughty visitors!
		$trans = get_html_translation_table(HTML_ENTITIES); 
		$trans[" "] = "&nbsp"; 
		$trans = array_flip ($trans);
		$string = strtr($string, $trans);
	
		// put back BOLD
			$string = preg_replace("/<b>(.*)<\/b>/U","&lt;b&gt;\\1&lt;/b&gt;",$string);
	
		// put back ITALIC
			$string = preg_replace("/<i>(.*)<\/i>/U","&lt;i&gt;\\1&lt;/i&gt;",$string);
		
		// put back UNDERLINE
			$string = preg_replace("/<u>(.*)<\/u>/U","&lt;u&gt;\\1&lt;/u&gt;",$string);
		
		// HYPERLINKS (EMAIL)
			$string = preg_replace("#<a href=\"(ftp://|https://|http://|mailto:)(.*)\" target=_blank>(.*)</a>#U","<a href=\"\\1\\2\">\\3</a>",$string);
		
		
		return $string;
	} #END var check
} #END BBdeCode()

'vB is jacking the code, so click quote and copy & paste it from the textarea.

Mark Hensler
If there is no answer on Google, then there is no question.

openmind posted this at 21:01 — 17th April 2002.

He has: 945 posts

Joined: Aug 2001

Why does PHP complicate things so!

Here's how I would do it in ColdFusion:

<cfset MSGBody = "#Replace(MSGBody, '<', '&lt;','ALL')#">
<cfset MSGBody = "#Replace(MSGBody, '>', '&gt;','ALL')#">

<!---Auto link creation--->
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(ht|f)(tps?://[A-Za-z0-9])([^[[:space:]]*)", '\1<a href="\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])(www\.)([A-Za-z:]*)([^[[:space:]]*)", '\1<a href="http://\2\3\4" target="_blank" class="body">\2\3\4</a>', "all")>
<cfset MSGBody = reReplaceNoCase(MSGBody, "(^[[:punct:]]*|[[:space:]])([A-Za-z0-9_\.]+@+[A-Za-z0-9]+\.+[A-Za-z0-9]+?\.?[A-Za-z0-9]*)", '\1<a href="mailto:\2" class="body">\2</a>', "all")>

<!---manul link creation--->
<!--- Links --->
<cfloop condition="#findnocase('[url=',MSGBody)# GT 0 AND #findnocase('[/url]',MSGBody)#">
	<CFSET URLString 	= "#MSGBody#">
	<CFSET StartPos 	= FindNoCase("[url=",urlString)>
	<CFSET EndOfURL 	= FindNoCase("]",urlString,StartPos)>
	<CFSET EndPos 		= FindNoCase("[/url",urlString)>
	<CFSET WebsiteURL 	= Mid(urlString,StartPos + 5, EndofURL - (StartPos + 5))>
	<CFSET WebsiteName 	= Mid(urlString, EndOfURL + 1, (EndPos - EndOfURL) -1)>
	<cfif  isdefined("url.keywords") or findnocase("[", websitename) or findnocase("]", websiteurl)>
	<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '#WebsiteURL#')#">
	<cfelse>
	<CFSET MSGBody = "#replace(MSGBody, '<a href="#WebsiteURL#" class="bb-url">#WebsiteName#</a>', '<a href=''#WebsiteURL#'' target=blank class=body>#WebsiteName#</a>')#">
	</cfif>
</cfloop>

<!--- Email Address --->
<cfloop condition="#findnocase('[email=',MSGBody)# GT 0 AND #findnocase('[/email]',MSGBody)# GT 0">
	<CFSET URLString 	= "#MSGBody#">
	<CFSET StartPos 	= FindNoCase("[email=",urlString)>
	<CFSET EndOfAdd 	= FindNoCase("]",urlString,StartPos)>
	<CFSET EndPos 		= FindNoCase("[/email",urlString)>
	<CFSET EmailAdd 	= Mid(urlString,StartPos + 7, EndofAdd - (StartPos + 7))>
	<CFSET Name 		= Mid(urlString, EndOfAdd + 1, (EndPos - EndofAdd) -1)>
	<cfif isdefined("url.keywords")>
	<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '#EmailAdd#')#">
	<cfelse>
	<CFSET MSGBody = "#replace(MSGBody, '[email=#EmailAdd#]#Name#[/email]', '<a href=mailto:#EmailAdd# class="body">#Name#</a>')#">
	</cfif>
	
</cfloop>

Much simpler methinks!

korndragon posted this at 21:21 — 17th April 2002.

They have: 87 posts

Joined: Dec 2001

thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.

but thanx for you time

Wil posted this at 08:32 — 18th April 2002.

They have: 601 posts

Joined: Nov 2001

And even easier in Perl

use HTML::Parser ();


 # Create parser object
 $p = HTML::Parser->new( api_version => 3,
                         start_h => [\&start, "tagname, attr"],
                         end_h   => [\&end,   "tagname"],
                         marked_sections => 1,
                       );


 # Parse document text chunk by chunk
 $p->parse($chunk1);
 $p->parse($chunk2);
 #...
 $p->eof;                 # signal end of document


 # Parse directly from file
 $p->parse_file("foo.html");
 # or
 open(F, "foo.html") || die;
 $p->parse_file(*F);

 # USAGE

 my $p = MyParser->new;
 $p->parse_file("foo.html");

- wil

Abhishek Reddy posted this at 11:43 — 18th April 2002.

He has: 3,348 posts

Joined: Jul 2001

Quote: Originally posted by korndragon
thanx 4 your help ppl... i found out a simpler way shortly after i posted this message, and i forgot to say nevermind.

but thanx for you time

Could you post the solution you found?

Thanks.

openmind posted this at 19:21 — 18th April 2002.

He has: 945 posts

Joined: Aug 2001