diff -y ?
On Wed, Dec 2, 2009 at 7:42 PM, Simon Banton centos@web.org.uk wrote:
At 08:54 +0000 2/12/09, hadi motamedi wrote:
Dear All Can you please do me favor and let me know how can I compare two files but not in line-by-line basis on my CentOS server ? I mean say row#1 in file1 has the same data as say row#5 in file2 , but the comm compares them in line-by-line basis that is not intended . It seems that the diff cannot do the job as well
This'll show you which lines are common to both files, and for the ones that aren't which file they're in.
perl -MData::Dumper -le 'while(<>) {chomp; push @{$s->{"$_"}}, $ARGV}; END{ print Dumper($s) }' file1 file2
... someone will be along shortly with a more elegant method.
HTH
S. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
But "#diff -y" compares the two files in line-by-line basis . But my two files do not have one-to-one correspondence , say row#1 in file1 maybe the same as say row#5 in file2 . So I seek a way that does not consider this as a difference (but diff will consider).
On Wed, Dec 2, 2009 at 9:47 AM, Brian McKerr bmckerr@gmail.com wrote:
diff -y ?
On Wed, Dec 2, 2009 at 7:42 PM, Simon Banton centos@web.org.uk wrote:
At 08:54 +0000 2/12/09, hadi motamedi wrote:
Dear All Can you please do me favor and let me know how can I compare two files but not in line-by-line basis on my CentOS server ? I mean say row#1 in file1 has the same data as say row#5 in file2 , but the comm compares them in line-by-line basis that is not intended . It seems that the diff cannot do the job as well
This'll show you which lines are common to both files, and for the ones that aren't which file they're in.
perl -MData::Dumper -le 'while(<>) {chomp; push @{$s->{"$_"}}, $ARGV}; END{ print Dumper($s) }' file1 file2
... someone will be along shortly with a more elegant method.
HTH
S. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 2009-12-02 10:56, hadi motamedi wrote:
But "#diff -y" compares the two files in line-by-line basis . But my two files do not have one-to-one correspondence , say row#1 in file1 maybe the same as say row#5 in file2 . So I seek a way that does not consider this as a difference (but diff will consider).
(( First, please do not top-post. ))
"diff" would match the line2 in file1 with the line5 in file2, and it would mark that some lines were inserted there.
I think you'll have to specify more what you mean by "compare", and what you think is different or same.
On Wed, Dec 2, 2009 at 10:01 AM, Paul Bijnens Paul.Bijnens@xplanation.comwrote:
On 2009-12-02 10:56, hadi motamedi wrote:
But "#diff -y" compares the two files in line-by-line basis . But my two files do not have one-to-one correspondence , say row#1 in file1 maybe the same as say row#5 in file2 . So I seek a way that does not consider this as a difference (but diff will consider).
(( First, please do not top-post. ))
"diff" would match the line2 in file1 with the line5 in file2, and it would mark that some lines were inserted there.
I think you'll have to specify more what you mean by "compare", and what you think is different or same.
-- Paul Bijnens, Xplanation Technology Services Tel +32 16 397.525 Interleuvenlaan 86, B-3001 Leuven, BELGIUM Fax +32 16 397.552
- I think I've got the hang of it now: exit, ^D, ^C, ^, ^Z, ^Q, ^^, *
- quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, ~., *
- stop, end, ^]c, +++ ATH, disconnect, halt, abort, hangup, KJOB, *
- ^X^X, :D::D, kill -9 1, kill -1 $$, shutdown, init 0, Alt-F4, *
- Alt-f-e, Ctrl-Alt-Del, Alt-SysRq-reisub, Stop-A, AltGr-NumLock, ... *
- ... "Are you sure?" ... YES ... Phew ... I'm out *
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Sorry . I tried for "#diff -y" but its output seems to have a comparison between the two files in line-by-line basis . As you mentioned , if the row#1 in file1 is in match with say row#5 in file2 I want it not to be considered as a difference. But the the output shows it as if it is being considered as a difference. Please correct me .
From: hadi motamedi motamedi24@gmail.com
Sorry . I tried for "#diff -y" but its output seems to have a comparison between the two files in line-by-line basis . As you mentioned , if the row#1 in file1 is in match with say row#5 in file2 I want it not to be considered as a difference. But the the output shows it as if it is being considered as a difference. Please correct me .
Could you be more precise when you say "compare"...? By example, to get matching lines, you could:
cat $FILE1 $FILE2 | sort | uniq -c | ...
You'd get each line preceded by the number of occurence; then grep what you want...
JD
On Wed, Dec 2, 2009 at 11:23 AM, John Doe jdmls@yahoo.com wrote:
From: hadi motamedi motamedi24@gmail.com
Sorry . I tried for "#diff -y" but its output seems to have a comparison
between the two files in line-by-line basis . As you mentioned , if the row#1 in file1 is in match with say row#5 in file2 I want it not to be considered as a difference. But the the output shows it as if it is being considered as a difference. Please correct me .
Could you be more precise when you say "compare"...? By example, to get matching lines, you could:
cat $FILE1 $FILE2 | sort | uniq -c | ...
You'd get each line preceded by the number of occurence; then grep what you want...
JD
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thank you very much for your reply . Please be informed that I tried to compare the files with your proposed code , as the followings : #cat Edit3 Edit4 |sort |uniq -c It is returning the same count on matches as I got from the following code : #perl -MData::Dumper -le 'while(<>) {chomp; push @{$s->{"$_"}},$ARGV}; END{ print Dumper($s) }' Edit3 Edit4 But it is easier to be used . Can you please do me favor and let me know if I can go further and try for advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ? Sincerely Yours
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try for advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
JD
John Doe wrote:
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try for advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
mark "why, yes, since you ask, I *have* written 100 and 200 line awk scripts"
From: mark m.roth@5-cent.us
John Doe wrote:
From: hadi motamedi
Can you please do me favor and let me know if I can go further and try for advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
Oh no!!! Don't get mad!!! ^_^ Teaching some UNIX pipes to a "beginner" can be helpful you know... And it is $4...
JD
From: mark m.roth@5-cent.us
John Doe wrote:
From: hadi motamedi
Can you please do me favor and let me know if I can go further and
try for advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use
one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
Oh no!!! Don't get mad!!! ^_^ Teaching some UNIX pipes to a "beginner" can be helpful you know... And it is $4...
You're right, it is $4, but what do you want, I was still half asleep, and getting ready to head to work....
And yeah, pipes are Good. I try to explain to folks why I call *Nix "fun", and one reason is the huge toolset that's *intended* to work together.
mark
On Thu, Dec 3, 2009 at 12:42 PM, mark m.roth@5-cent.us wrote:
John Doe wrote:
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try
for
advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
mark "why, yes, since you ask, I *have* written 100 and 200 line awk scripts"
-- Though I don't think (object-oriented programming) has much to offer good programmers, except in certain specialized domains, it is irresistible to large organizations. Object-oriented programming offers a sustainable way to write spaghetti code. - Paul Graham _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Sorry . I tried for your proposed procedure , as the followings : #awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' HLRSubscriber-20091111173349.csv But my CentOS server didn't return to the prompt . Can you please let me know why it is in an end-less iterated loop ? Thank you in advance
hadi motamedi wrote:
On Thu, Dec 3, 2009 at 12:42 PM, mark m.roth@5-cent.us wrote:
John Doe wrote:
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try
for
advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
Sorry . I tried for your proposed procedure , as the followings : #awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' HLRSubscriber-20091111173349.csv But my CentOS server didn't return to the prompt . Can you please let me know why it is in an end-less iterated loop ? Thank you in advance
Syntax error. You wrote if ( $3 !~ /^0 not if ( $3 !~ /^0/
PLEASE: if you ask for help, and someone gives you examples, READ THE MAN PAGES SO THAT YOU KNOW WHAT YOU'RE DOING. I could have just as well have given you something that would have wiped your system (like system("rm -rf /").
mark
mark wrote:
hadi motamedi wrote:
On Thu, Dec 3, 2009 at 12:42 PM, mark m.roth@5-cent.us wrote:
John Doe wrote:
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try
for
advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
Sorry . I tried for your proposed procedure , as the followings : #awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' HLRSubscriber-20091111173349.csv But my CentOS server didn't return to the prompt . Can you please let me know why it is in an end-less iterated loop ? Thank you in advance
Syntax error. You wrote if ( $3 !~ /^0 not if ( $3 !~ /^0/
PLEASE: if you ask for help, and someone gives you examples, READ THE MAN PAGES SO THAT YOU KNOW WHAT YOU'RE DOING. I could have just as well have given you something that would have wiped your system (like system("rm -rf /").
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
But: grep -v '^.*,.*,.*,0' filename |wc -l seems simple enough and says what you mean.
Or: cut -d, -f4 | grep -v '^0' |wc -l
Les Mikesell wrote:
mark wrote:
hadi motamedi wrote:
On Thu, Dec 3, 2009 at 12:42 PM, mark m.roth@5-cent.us wrote:
John Doe wrote:
From: hadi motamedi motamedi24@gmail.com
Can you please do me favor and let me know if I can go further and try
for
advanced search like finding how many rows inside a file have data that does not start with a zero after the third comma ?
Something like: awk -F, ' { print $4 } ' | grep -v "^0" | wc -l Use one command at a time to see how they work with each other (you might have to modify the grep a bit)...
*sigh*
Drive me crazy, why use multiple commands?
awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' filename
Sorry . I tried for your proposed procedure , as the followings : #awk -F 'BEGIN { FS = ","; }{if ( $3 !~ /^0 ) { count++; }} END { print count }' HLRSubscriber-20091111173349.csv But my CentOS server didn't return to the prompt . Can you please let me know why it is in an end-less iterated loop ? Thank you in advance
Syntax error. You wrote if ( $3 !~ /^0 not if ( $3 !~ /^0/
PLEASE: if you ask for help, and someone gives you examples, READ THE MAN PAGES SO THAT YOU KNOW WHAT YOU'RE DOING. I could have just as well have given you something that would have wiped your system (like system("rm -rf /").
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
ROTFLMAO!
But: grep -v '^.*,.*,.*,0' filename |wc -l seems simple enough and says what you mean.
Or: cut -d, -f4 | grep -v '^0' |wc -l
So, is there an obfuscated shell contest?
mark
mark wrote:
Les Mikesell wrote:
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
Who have probably almost all started something in awk and ended up either needing a pipeline of other programs or switching to perl. If your machine is powerful enough to run perl (and I can't imagine one that isn't in this century) you might as well use it because it does anything awk can do and more. awk is almost as complicated to learn but can't do as much and is harder to debug. Maybe it made sense on computers of the 1970's, or before perl was available.
ROTFLMAO!
But: grep -v '^.*,.*,.*,0' filename |wc -l seems simple enough and says what you mean.
Or: cut -d, -f4 | grep -v '^0' |wc -l
So, is there an obfuscated shell contest?
Shell commands are just what you'd type so you have to know it anyway so there is nothing special about making a program out of it. Other than grep using regexps the man pages for those programs are probably literally a page. No one is going to understand awk or perl after reading a page. Personally I'd probably have loaded the file in vi, done ':v/^.*,.*,.*,0/d', then hit ctl-g to see how many lines were left, then u to put them back.
mark wrote:
Les Mikesell wrote:
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
Reading the response, I realize you were serious, not being funny, as I thought.
Who have probably almost all started something in awk and ended up either needing a pipeline of other programs or switching to perl. If your machine is powerful enough to run perl (and I can't imagine one that isn't in this century) you might as well use it because it does anything awk can do and more.
I started seeing references to perl in the early nineties, so it ran on those machines. Also, I remember running into Larry Wall, and responding to him very irritatedly, around '93 or '94, when he showed up on comp.language.awk, and told someone the answer to his question was to go to perl. Now, I really like perl, but for some things - like were I want to do nothing but process one or maybe two text files at a time, and want to loop through the whole thing, it's simpler.
awk is almost as complicated to learn but can't do as much and is harder
"Almost as complicated to learn"? I had no trouble learning it around, oh, '91. But then, at that point I'd been programming professionally for more than 10 years. If you know perl, and you can program shell, and if you know any other language (unless *all* you know is Objectionably Oriented languages), there's minimal ramp-up time. <snip>
Maybe it made sense on computers of the 1970's, or before perl was available.
awk standardized pretty much, according to what I've read - possibly man pages on Sun 3's or Irix - around '83. perl was *NOT* part of std. distros until the end of the nineties. And they do a lot of the same thing. To some degree, it's a matter of preferences, and to put down awk as "almost as complicated as perl to learn" does not impress me. <snip>
Shell commands are just what you'd type so you have to know it anyway so there is nothing special about making a program out of it. Other than grep using regexps the man pages for those programs are probably
And regexes have always been considered a black art - there's always the "how many escapes do you need for this", esp. if it's in a script.
literally a page. No one is going to understand awk or perl after reading a page. Personally I'd probably
So, you don't actually know any programming, and it sounds like you want to learn as little as possible, even though doing so will make your life easier upstream. <snip> Try it - you might find that to be the case.
Oh, and if you're on this list, then the mundane world doesn't consider you "normal", anyway; you're a geek, or a wonk, or a fill-in-the-stereotype-put-down-name, not a "k3wl dud3".
mark
m.roth@5-cent.us wrote:
mark wrote:
Les Mikesell wrote:
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
Reading the response, I realize you were serious, not being funny, as I thought.
Yes, I'm serious that if you don't already know awk, there is little to be gained from looking at it now. Perl can do everything awk can do and more, while shell scripts can do the simpler things.
Who have probably almost all started something in awk and ended up either needing a pipeline of other programs or switching to perl. If your machine is powerful enough to run perl (and I can't imagine one that isn't in this century) you might as well use it because it does anything awk can do and more.
I started seeing references to perl in the early nineties, so it ran on those machines. Also, I remember running into Larry Wall, and responding to him very irritatedly, around '93 or '94, when he showed up on comp.language.awk, and told someone the answer to his question was to go to perl. Now, I really like perl, but for some things - like were I want to do nothing but process one or maybe two text files at a time, and want to loop through the whole thing, it's simpler.
No, it is just different. If you want perl to loop, it can. Try the a2p translator.
awk is almost as complicated to learn but can't do as much and is harder
"Almost as complicated to learn"? I had no trouble learning it around, oh, '91. But then, at that point I'd been programming professionally for more than 10 years. If you know perl, and you can program shell, and if you know any other language (unless *all* you know is Objectionably Oriented languages), there's minimal ramp-up time.
If you know perl, there's no point in downgrading to awk. If you don't know either, you will find awk to be weird and unlike anything else. Back when it was the only way to do math in a shell script it might have been worth the trouble.
awk standardized pretty much, according to what I've read - possibly man pages on Sun 3's or Irix - around '83. perl was *NOT* part of std. distros until the end of the nineties. And they do a lot of the same thing. To some degree, it's a matter of preferences, and to put down awk as "almost as complicated as perl to learn" does not impress me.
OK, I'll revise that and say it is much, much harder to use awk to accomplish tasks in general than it is with perl. First there is the problem of the things that awk just can't do at all - like inputting data from places other than stdin or files, so you'll end up embedding awk in a shell script with other tools doing the heavy lifting, and probably having to arrange shell variable expansion into the awk script. Then there is the real advantage of perl over almost every other language, which is that anything you are likely to want to do will already have been done and is available as a module on CPAN - so you will probably only have to write half a page or so yourself even for large jobs and things that get data from sockets, databases or URLs.
<snip> > Shell commands are just what you'd type so you have to know it anyway so > there is nothing special about making a program out of it. Other than > grep using regexps the man pages for those programs are probably
And regexes have always been considered a black art - there's always the "how many escapes do you need for this", esp. if it's in a script.
You can't get too far without understanding shell parsing even if you just type stuff on the command line. But, regexps within a perl script don't have to deal with this at all.
literally a page. No one is going to understand awk or perl after reading a page. Personally I'd probably
So, you don't actually know any programming, and it sounds like you want to learn as little as possible, even though doing so will make your life easier upstream.
<snip> Try it - you might find that to be the case.
Wrong conclusion. I've started a lot of things in shell and awk and hit dead ends when I needed functionality that they couldn't handle - and ended up starting over in perl. Now I would only start in shell if I know the simpler utilities can handle the whole job (which, as data is increasingly handled in databases and xml over networks, is increasingly rare).
Oh, and if you're on this list, then the mundane world doesn't consider you "normal", anyway; you're a geek, or a wonk, or a fill-in-the-stereotype-put-down-name, not a "k3wl dud3".
Agreed, but for this group, understanding regexps and the shell is fairly essential and needing perl's full functionality is probably common, where awk is just a historical oddity. It still works for its old tasks, but it's not up to the ways data is currently handled and is likely to be a waste of time to consider if you don't already understand its internal parser.
m.roth@5-cent.us wrote:
mark wrote:
Les Mikesell wrote:
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
Reading the response, I realize you were serious, not being funny, as I thought.
Yes, I'm serious that if you don't already know awk, there is little to be gained from looking at it now. Perl can do everything awk can do and more, while shell scripts can do the simpler things.
Ok, there's no point to continuing this - I use whatever tool I feel like, and which is simplest to me to do the job: the *Nix way. I also know a perl bigot when I see one.
You *also* missed the reason I was pushing the original poster to read the man page, rather than just do what I said, without trying to understand what I was suggesting they do - they made a sytactical mistake that would have had the *same* result in perl - he missed the closing / on the expression <snip> mark
m.roth@5-cent.us wrote:
Awk is just too weird for normal people. I wouldn't even suggest reading that manual. If you can't do what you want with regexps and a pipeline of simpler programs, you might as well use perl.
<Looks around, yeah, this *is* a list for sysadmins of Linux....>
Reading the response, I realize you were serious, not being funny, as I thought.
Yes, I'm serious that if you don't already know awk, there is little to be gained from looking at it now. Perl can do everything awk can do and more, while shell scripts can do the simpler things.
Ok, there's no point to continuing this - I use whatever tool I feel like, and which is simplest to me to do the job: the *Nix way. I also know a perl bigot when I see one.
You seem to have missed all the places that I mentioned already knowing awk as an exception. I'm just not recommending taking the time to learn it if you don't already - and if you want to call logical conclusions for the reasons I posted bigotry, fine - be that way.
You *also* missed the reason I was pushing the original poster to read the man page, rather than just do what I said, without trying to understand what I was suggesting they do - they made a sytactical mistake that would have had the *same* result in perl - he missed the closing / on the expression
Sure, but perl would have told you 'search pattern not terminated at line xx', or you would have gotten this from running 'perl -c' to check syntax ahead of time. Would you care to disclose how many of your own hours you've wasted on the mysteries of awk before you learned to read that carefully?