Removing Duplicate Lines?

They have: 117 posts

Joined: Mar 2000

I have a file that I use for my mailing list. Each e-mail address is on a line. I need to know how to delete all the duplicates. How do that?

Thanks!

--Edge

They have: 193 posts

Joined: Feb 2000

As in duplicate lines? Or duplicate entries (subscribes twice? Or like blank lines?

Richard
richjb::397

They have: 314 posts

Joined: Nov 1999

I also need to know how to delete emails if they have emails if they have been submitted twice but i dont want to delte them off my list, just ignore the second one.

They have: 193 posts

Joined: Feb 2000

The following script will go through a file and put any unique entries into the $new_file and if it finds an entry that already exists, then it will add it to $extra_file.

So, if all you want to do is filter duplicate subscriptions, this script will do it:

#!/usr/bin/perl

BEGIN {
use CGI::Carp qw(fatalsToBrowser);
}

use CGI qw(param header);
print header;

$old_file = "/home/brevig/www/remove_extra/old.dat";
$new_file = "/home/brevig/www/remove_extra/new.dat";
$extra_file = "/home/brevig/www/remove_extra/extra.dat";

open(OLD, "<$old_file") || die "Can't open $old_file: $!";
@OLD = <OLD>;
close(OLD);

@list = ("first_element");

open(NEW, ">>$new_file") || die "Can't open $new_file: $!";

open(EXTRA, ">>$extra_file") || die "Can't open $new_file: $!";

foreach $old_check (@OLD) {
$now = "no";
chomp($old_check);
foreach (@list) {
  if($old_check eq $_) {
   $now = "yes";
  }
}
if ($now eq "no") {
  push(@list, $old_check);
  print "new: $old_check<br>";
  print NEW "$old_check\n";
} else {
  print "<b>extra</b>: $old_check<br>";
  print EXTRA "$old_check\n";
}
}

close(NEW);
close(EXTRA);

print "<BR><BR><BR>done";
'

Tell me if you have any problems.

Richard
richjb::399

[email protected]

Everyone here has a website. It's just that not all are worth posting (Mine! Smiling).

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Haven't tested this, just a quicky thing.

#!/usr/bin/perl

$file = "your.dat";

open(FILE, "<$file") || die "Can't open $file: $!";
@File = <FILE>;
close(FILE);

foreach $emial (@File) {
$Revised = join(" ", @Revised);
chomp($emial);
unless ($Revised =~ $email) {
push(@Revised, $email);
}
}

open(FILE, ">$file") || die "Can't open $file: $!";
foreach $email (@Revised) {
print FILE "$email\n";
close(FILE);

print "'Twas a success!";
'

Mark Hensler
If there is no answer on Google, then there is no question.

They have: 193 posts

Joined: Feb 2000

Albert, welcome to TWF. Smiling

Also, you're code won't work as you typoed "emial?" Wink

Richard
richjb::409

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Sorry... should be
chomp($email);

Should it work now?

They have: 193 posts

Joined: Feb 2000

Here's your code with comments on errors in bold:

#!/usr/bin/perl

$file = "your.dat";

open(FILE, "<$file") || die "Can't open $file: $!";
@File = <FILE>;
close(FILE);

foreach $emial (@File) { # <strong>Typo</strong>
$Revised = join(" ", @Revised);
chomp($emial); # <strong>Typo</strong>
unless ($Revised =~ $email) {
push(@Revised, $email);
}
}

open(FILE, ">$file") || die "Can't open $file: $!";
foreach $email (@Revised) {
print FILE "$email\n";
close(FILE);
# <strong> No ending } No content type!</strong>
print "'Twas a success!";
'

Those are the only mistakes I see. Smiling

Richard
richjb::410

[email protected]

Everyone here has a website. It's just that not all are worth posting (Mine! Smiling).

They have: 453 posts

Joined: Jan 1999

Sorry guys,

but I'll have to do this.
I assume you all run on unix hosts....

cat your.dat|sort >sort.dat;
cat sort.dat|uniq > clean.dat;
diff --minimal clean.dat sort.dat |grep "^>" |sed "s/^>//" > extra.dat
'
this isn't optimal, but it took me only 1 minute to write it down Wink

btw:
do you lower case all emails when you enter them to the file ?
Otherwise [email protected] and [email protected] will stay in the list.

anti

[Edited by anti on 08-17-2000 at 05:35 AM]

They have: 117 posts

Joined: Mar 2000

Thanks, richjb! Your script worked great!!

--Edge

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.