Removing Duplicate Lines?
I have a file that I use for my mailing list. Each e-mail address is on a line. I need to know how to delete all the duplicates. How do that?
Thanks!
--Edge
I have a file that I use for my mailing list. Each e-mail address is on a line. I need to know how to delete all the duplicates. How do that?
Thanks!
--Edge
richjb posted this at 18:58 — 16th August 2000.
They have: 193 posts
Joined: Feb 2000
As in duplicate lines? Or duplicate entries (subscribes twice? Or like blank lines?
Richard
richjb::397
minton posted this at 19:11 — 16th August 2000.
They have: 314 posts
Joined: Nov 1999
I also need to know how to delete emails if they have emails if they have been submitted twice but i dont want to delte them off my list, just ignore the second one.
Thomas
The JavaScript Place
The JavaScript Place Forums
richjb posted this at 20:26 — 16th August 2000.
They have: 193 posts
Joined: Feb 2000
The following script will go through a file and put any unique entries into the $new_file and if it finds an entry that already exists, then it will add it to $extra_file.
So, if all you want to do is filter duplicate subscriptions, this script will do it:
#!/usr/bin/perl
BEGIN {
use CGI::Carp qw(fatalsToBrowser);
}
use CGI qw(param header);
print header;
$old_file = "/home/brevig/www/remove_extra/old.dat";
$new_file = "/home/brevig/www/remove_extra/new.dat";
$extra_file = "/home/brevig/www/remove_extra/extra.dat";
open(OLD, "<$old_file") || die "Can't open $old_file: $!";
@OLD = <OLD>;
close(OLD);
@list = ("first_element");
open(NEW, ">>$new_file") || die "Can't open $new_file: $!";
open(EXTRA, ">>$extra_file") || die "Can't open $new_file: $!";
foreach $old_check (@OLD) {
$now = "no";
chomp($old_check);
foreach (@list) {
if($old_check eq $_) {
$now = "yes";
}
}
if ($now eq "no") {
push(@list, $old_check);
print "new: $old_check<br>";
print NEW "$old_check\n";
} else {
print "<b>extra</b>: $old_check<br>";
print EXTRA "$old_check\n";
}
}
close(NEW);
close(EXTRA);
print "<BR><BR><BR>done";
Tell me if you have any problems.
Richard
richjb::399
[email protected]
Everyone here has a website. It's just that not all are worth posting (Mine! ).
Mark Hensler posted this at 06:07 — 17th August 2000.
He has: 4,048 posts
Joined: Aug 2000
Haven't tested this, just a quicky thing.
#!/usr/bin/perl
$file = "your.dat";
open(FILE, "<$file") || die "Can't open $file: $!";
@File = <FILE>;
close(FILE);
foreach $emial (@File) {
$Revised = join(" ", @Revised);
chomp($emial);
unless ($Revised =~ $email) {
push(@Revised, $email);
}
}
open(FILE, ">$file") || die "Can't open $file: $!";
foreach $email (@Revised) {
print FILE "$email\n";
close(FILE);
print "'Twas a success!";
Mark Hensler
If there is no answer on Google, then there is no question.
richjb posted this at 07:29 — 17th August 2000.
They have: 193 posts
Joined: Feb 2000
Albert, welcome to TWF.
Also, you're code won't work as you typoed "emial?"
Richard
richjb::409
Mark Hensler posted this at 07:48 — 17th August 2000.
He has: 4,048 posts
Joined: Aug 2000
Sorry... should be
chomp($email);
Should it work now?
richjb posted this at 08:20 — 17th August 2000.
They have: 193 posts
Joined: Feb 2000
Here's your code with comments on errors in bold:
#!/usr/bin/perl
$file = "your.dat";
open(FILE, "<$file") || die "Can't open $file: $!";
@File = <FILE>;
close(FILE);
foreach $emial (@File) { # <strong>Typo</strong>
$Revised = join(" ", @Revised);
chomp($emial); # <strong>Typo</strong>
unless ($Revised =~ $email) {
push(@Revised, $email);
}
}
open(FILE, ">$file") || die "Can't open $file: $!";
foreach $email (@Revised) {
print FILE "$email\n";
close(FILE);
# <strong> No ending } No content type!</strong>
print "'Twas a success!";
Those are the only mistakes I see.
Richard
richjb::410
[email protected]
Everyone here has a website. It's just that not all are worth posting (Mine! ).
anti posted this at 09:31 — 17th August 2000.
They have: 453 posts
Joined: Jan 1999
Sorry guys,
but I'll have to do this.
I assume you all run on unix hosts....
cat your.dat|sort >sort.dat;
cat sort.dat|uniq > clean.dat;
diff --minimal clean.dat sort.dat |grep "^>" |sed "s/^>//" > extra.dat
this isn't optimal, but it took me only 1 minute to write it down
btw:
do you lower case all emails when you enter them to the file ?
Otherwise [email protected] and [email protected] will stay in the list.
anti
[Edited by anti on 08-17-2000 at 05:35 AM]
Edge posted this at 16:12 — 17th August 2000.
They have: 117 posts
Joined: Mar 2000
Thanks, richjb! Your script worked great!!
--Edge
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.