Comparing fields in 1 database with fields in another
Hello,
I have two flatfile databases (let's call them file1 and file2), each with a number of lines. Say, file1 has 2 lines, and file 2 has 4 lines. Each line in each database has exactly the same delimited fields. I need to keep the 2 files separate, though.
I want to take the first line in file1, and then compare it with each of the 4 lines in file2, to see which fields are the same. So out of 10 possible fields, line 1 in file1 might have 6 matches with line 1 file2, 3 matches with line 2 file2, 7 matches with line 3 file2, and 1 match with line4 file2. I then need to be able to list the number of matches for each line of file 2 with line 1 file 1, and state what the matches are.
Once that's done, I want to repeat the above paragraph for the second line in file1. So, I need to treat each line in file 1 separately, but I need to compare it with every line in file 2.
The output might be:
Line 1 of File 1 has 6 matches with file2line1. These are...
Line 1 of File 1 has 3 matches with file2line2. These are...
etc etc etc
Line 2 of File 1 has 8 matches with file2line1. These are...
etc etc etc
Is this possible? Is it easy? I have been trying to do it using foreach loops, but my brain can't handle the logic at this stage.
Any ideas?
Thanks
Peter J. Boettcher posted this at 13:32 — 16th October 2001.
They have: 812 posts
Joined: Feb 2000
Comparing data in 2 flat file databases? I hope you're not expecting great performance! You didn't say what server you're on, so I'll use what I'm most comfortable with (ASP). The best way I can think of doing it would be to open the flat files using FSO, then read each line into an array. For example, let's say your flat file is like this:
fieldOne, fieldTwo, fieldThree, fieldFour
fieldOne, fieldTwo, fieldThree, fieldFour
As you're looping through the file you need to pass these values to a 2 dimensional array like:
arrFile1(x,0) = 1 'line number
arrFile1(x,1) = FieldValue
Then do the same thing with file 2:
arrFile2(x,0) = 1 'line number
arrFile2(x,1) = FieldValue
Once you've got the data in arrays, you can do whatever you want with it. For example:
For x = LBound(arrFile1) To UBound(arrFile1)
For y = LBound(arrFile2) To UBound(arrFile2)
If arrFile1(x,1) = arrFile2(y,1) Then
arrResult(x,0) = arrFile1(x,0)
arrResult(x,1) = arrFile1(x,1)
arrResult(x,2) = arrFile2(y,0)
End If
intMatches = intMatches + 1
Next
arrMatches(x,0) = arrFile2(y,0)
arrMatches(x,1) = intMatches
intMatches = 0
Next
So, arrResult would hold all matches in the following way:
arrResult(x,0) holds the line number from File1
arrResult(x,1) holds the value that was matched
arrResult(x,2) holds the line number from File2
Then by looping through that array you get the search results. If it's empty or doesn't exist then there were no results.
The number of matches per line is held in the arrMatches array. Hopefully this can point you in the right direction?
PJ | Are we there yet?
pjboettcher.com
Mark Hensler posted this at 16:25 — 16th October 2001.
He has: 4,048 posts
Joined: Aug 2000
I'll bet he's using Perl.. since that's where I've seen the most flat file databases.
I'll assume the flat file is like this:
fieldOne|fieldTwo|fieldThree|fieldFour
fieldOne|fieldTwo|fieldThree|fieldFour
Been a while since I've done much perl, but here's my interpretation of Peter's code into perl.
# load file 1 into an array
open(FILE1,"<$file");
@file = <FILE1>;
$num_lines = @file1;
# loop through lines in file 1
$i=0;
for ($x=0;$x<$num_lines;$x++) {
@pieces = split("|",$line1);
foreach $thing (@pieces) {
@arrFile1[$i] = array($x, $thing);
$i++;
}
}
close(FILE1);
# load file 2 into an array
open(FILE2,"<$file");
@file = <FILE2>;
$num_lines = @file1;
# loop through lines in file 1
$i=0;
for ($x=0;$x<$num_lines;$x++) {
@pieces = split("|",$line);
foreach $thing (@pieces) {
@arrFile2[$i] = array($x, $thing);
$i++;
}
}
close(FILE2);
$num_fileds1 = @arrFile1;
$num_fileds2 = @arrFile2;
for ($x=0;$x<$num_fileds1;$x++) {
for ($y=0;$y<$num_fileds2;$y++) {
if ($arrFile1[$x][1] == $arrFile1[$x][1]) {
$arrResult[$x][0] = $arrFile1[$x][0];
$arrResult[$x][1] = $arrFile1[$x][1];
$arrResult[$x][2] = $arrFile2[$y][0];
} #END if
$intMatches = $intMatches + 1;
} #END for 2
$arrMatches[$x][0] = $arrFile2[$y][0];
$arrMatches[$x][1] = $intMatches;
$intMatches = 0;
} #END for 1
Not tested, and probably full of errors...
Mark Hensler
If there is no answer on Google, then there is no question.
rline posted this at 03:05 — 17th October 2001.
They have: 40 posts
Joined: Oct 2001
G'day Peter and Mark,
Mark, you win the bet. I am using perl, and my two databases are delimited by |. Thanks for both your replies.
Below I've added your code, which I modified slightly where I thought changes needed to be made. I also commented certain lines, to reflect what I understand is going on, and so you can set me straight if I don't.
Good news is there are no error messages.
You'll notice I asked it to print to the screen, the number of lines in each of the 2 files (which it did successfully, so I know that part has worked), and the number of elements in the two arrays @arrFile1 and @arrFile2. It printed "0" and "0" for each of these values, so I think this may be part of the problem. There seems to be no elements in either of these arrays.
As to matching, I have included the exact same line in each of the files, so I know that if the code works, I should get at least one line in file 1 which completely matches with a line in file 2.
Anyway, any comnents and help will be appreciated!
Thanks...
####### Code starts here #######
#!/usr/bin/perl
$file1 = "available.txt"; #my first database
$file2 = "needed.txt"; # my 2nd database
# load file 1 into an array
open(FILE1,"<$file1");
@file1 = ;
$num_lines1 = @file1;
# loop through lines in file 1
$i=0;
for ($x=0;$x<$num_lines;$x++) {
@pieces1 = split("|",$line1);
foreach $thing (@pieces1) {
@arrFile1[$i] = array($x, $thing);
$i++;
}
}
close(FILE1); # array containing file 1 is called @arrFile1
# load file 2 into an array
open(FILE2,"<$file2");
@file2 = ;
$num_lines2 = @file2;
# loop through lines in file 2
$i=0;
for ($y=0;$y<$num_lines2;$y++) {
@pieces2 = split("|",$line);
foreach $thing (@pieces2) {
@arrFile2[$i] = array($y, $thing);
$i++;
}
}
close(FILE2); # array containing file2 is called @arrFile2
$num_fields1 = @arrFile1; #counts number of elements in array 1
$num_fields2 = @arrFile2; # counts number of elements in array 2 (at present, both these are "0")
for ($x=0;$x<$num_fields1;$x++) {
for ($y=0;$y<$num_fields2;$y++) { # conditions of the for loop
if ($arrFile1[$x][1] == $arrFile1[$x][1]) {
$arrResult[$x][0] = $arrFile1[$x][0];
$arrResult[$x][1] = $arrFile1[$x][1];
$arrResult[$x][2] = $arrFile2[$y][0];
} #END if # I don't fully understand what's happened in the "if" statement
$intMatches = $intMatches + 1;
} #END for 2
$arrMatches[$x][0] = $arrFile2[$y][0];
$arrMatches[$x][1] = $intMatches;
$intMatches = 0;
} #END for 1
print "Content-type: text/html", "\n\n";
print "$num_lines1$num_lines2$num_fields1$num_fields2\n";
foreach $line (@arrMatches) {
print "$line\n\n";
}
print "$intMatches\n";
Mark Hensler posted this at 04:14 — 17th October 2001.
He has: 4,048 posts
Joined: Aug 2000
Made some changes... (also added more comments to help you visualize)
#!/usr/bin/perl
$file1 = "available.txt"; #my first database
$file2 = "needed.txt"; # my 2nd database
# open file 1
open(FILE,"<$file1");
# suck it into an array
@file = <FILE>;
# close the file
close(FILE);
# count the number of lines
$num_lines = @file;
$i=0;
# loop through lines in file 1
for ($x=1; $x<=$num_lines; $x++) {
# split the line into pieces so each field is an element in an array
# $peices[0] = "field_one_value";
# $peices[1] = "field_two_value";
# ...
@pieces = split("|", $file[$i]);
# loop through the array we just created
foreach $thing (@pieces) {
# now, smash the pieces together into yet another array
# $arrFile1 = array(line number, value)
@arrFile1[$i] = array($x, $thing);
$i++;
}
} #END for
# @arrFile1 now contains all the fields from file 1 in the following format:
# $arrFile1[x] = array(line number, value);
#
# example:
# $arrFile1[0] = array(1, "field_one_value");
# $arrFile1[1] = array(1, field_two_value");
# ...
# $arrFile1[9] = array(1, "field_ten_value");
# $arrFile1[10] = array(2, "field_one_value");
# do it all again for file 2
open(FILE2, "<$file2");
@file = <FILE>;
close(FILE2);
$num_lines = @file;
# loop through lines in file 2
$i=0;
for ($y=1; $y<=$num_lines; $y++) {
@pieces = split("|", $file[$i]);
foreach $thing (@pieces) {
# $arrFile2 = array(line number, value)
@arrFile2[$i] = array($y, $thing);
$i++;
}
}
# @arrFile2 now contains all the fields from file 2
# count the number of elements in each array
$num_fields1 = @arrFile1;
$num_fields2 = @arrFile2;
# loop through the fields from file 1
for ($x=1; $x<=$num_fields1; $x++) {
# loop through the fields from file 2
for ($y=1; $y<=$num_fields2; $y++) {
# if the field from file 1 matches the field from file 2
if ($arrFile1[$x][1] == $arrFile1[$x][1]) {
# store the line from file 1
$arrResult[$x][0] = $arrFile1[$x][0];
# store the value of the field
$arrResult[$x][1] = $arrFile1[$x][1];
# store the line from file 2
$arrResult[$x][2] = $arrFile2[$y][0];
} #END if
# increment the number of matches for this field in file 1
$intMatches = $intMatches + 1;
} #END for (file2)
# I'm not sure what this does...
# looks like its saving the last field in file 2 that this field in file 1 matched
# I'm probably wrong, print it and find out
$arrMatches[$x][0] = $arrFile2[$y][0];
# save the number of times this field from file 1 matched a field from file 2
$arrMatches[$x][1] = $intMatches;
# reset the counter
$intMatches = 0;
} #END for (file1)
print "Content-type: text/html", "\n\n";
print "$num_lines1<br>$num_lines2<br>$num_fields1<br>$num_fields2<br><br>\n";
foreach $line (@arrMatches) {
print "$line\n\n";
}
print "$intMatches\n";
I corrected some things, and re-used vars to save memory.
Mark Hensler
If there is no answer on Google, then there is no question.
rline posted this at 05:12 — 17th October 2001.
They have: 40 posts
Joined: Oct 2001
Hi Mark,
Thanks again. Short answer is it doesn't work (yet!) But there are no error messages, and I may have found where a problem is. Below is the bit I'm interested in. You'll notice I threw in a couple of print statements, to see what was printing.
So, $num_lines1 is correct, so it's working up to there. Then I tell it to print the array @pieces, which it does, but only the first of the 2 lines. (but it's only supposed tp print the first line at that stage, isn't it?)
When I tell it to print both @arrFile1 and $num_fields1, it prints neither. Which makes me think perl's happy to do the code up to the creation of @arrFile1.
Would it help for me to include the 2 flatfiles?
Regards,
##### Code here #####
# count the number of lines
$num_lines1 = @file1;
print "$num_lines1";
$i=0;
# loop through lines in file 1
for ($x=1; $x<=$num_lines1; $x++) {
# split the line into pieces so each field is an element in an array
# $pieces[0] = "field_one_value";
# $pieces[1] = "field_two_value";
# ...
@pieces1 = split(/\|/, $file1[$i]);
print "@pieces1\n";
# loop through the array we just created
foreach $thing1 (@pieces1) {
# now, smash the pieces together into yet another array
# $arrFile1 = array(line number, value)
$arrFile1[$i] = array($x, $thing1);
$i++;
}
$num_fields1 = @arrFile1;
} #END for
print "$num_fields1";
print "@arrFile1";
# @arrFile1 now contains all the fields from file 1 in the following format:
# $arrFile1[x] = array(line number, value);
#
# example:
# $arrFile1[0] = array(1, "field_one_value");
# $arrFile1[1] = array(1, field_two_value");
# ...
# $arrFile1[9] = array(1, "field_ten_value");
# $arrFile1[10] = array(2, "field_one_value");
Mark Hensler posted this at 07:13 — 17th October 2001.
He has: 4,048 posts
Joined: Aug 2000
I see you fixed my split() function. I missed that. I'm not really the perl expert here, it's really been too long.
I'm not sure how perl handles pringing @arrays. You may have to loop through the elements and print them.
You don't need to post the 2 flat files. All we need is the format, which you provided.
Mark Hensler
If there is no answer on Google, then there is no question.
rline posted this at 14:12 — 17th October 2001.
They have: 40 posts
Joined: Oct 2001
Hi Mark & Peter (and anyone else),
I've come up with something that does the job. I don't know how "clean" the code is, but it does exactly what I needed.
So, in case anyone can use it, here it is. Of course, if anyone wants to offer improvements on it, please feel free...
Thanks for the help I received...
#!/usr/bin/perl
print "Content-type: text/html\n\n";
$file1 = "file1.txt";
$file2 = "file2.txt";
open(FILE1,"<$file1");
@file1 = ;
close(FILE1);
open(FILE2,"<$file2");
@file2 = ;
close(FILE2);
$matches = 0; #set the matches counter
foreach $line ( @file1 ) { #look at a line in file 1
foreach $line2 ( @file2 ) { #then, look at a line in file 2
@array1 = split (/\|/, $line); #turn the first line into an array
@array2 = split (/\|/, $line2); # turn the 2nd line into an array
$numavail = @array1;
$limit = $numavail - 3; # limit is the last of the fields I want to consider for matches
for ( $i=12; $i < $limit; $i++ ) {
if ( $array1[$i] eq $array2[$i] ) {# compare the corresponding fields in each line
$matches++; # keeps track of how many matches line to line
} # end the if
} # end the for
if ( $matches >= 1 && $array1[14] ge $array2[14] && $array1[12] le $array2[12]) { # if the matches are "right"
print "Here is where I print the data I want to retreive, using things like $array1[12] etc";
} # end the 2nd if
$matches = 0; # reset the counter so the number of matches don't continually increase
} # end the 2nd foreach
} #end the 1st foreach
Peter J. Boettcher posted this at 14:34 — 17th October 2001.
They have: 812 posts
Joined: Feb 2000
That's great, pretty compact code too!
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.