Hi Friends,
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
The desired output should be like this
admin: ankush, amit powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get the desired output. Please help me
Thanks & Regards
Ankush
I knocked up the enclosed under Cygwin:
#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort
The meat is the AWK programme. If collects all instances of the second column in an array indexed on the entries in the first column. At the end of the input file it handles each element of the array in turn, dropping the grammatically incorrect leading comma and space. The sort just sorts lines alphabetically, as you implied. The ( cat ... ) | construct is just to push in your test data.
Are the headings part of the file? In which case you may need to add a line:
NR == 1 { next }
immediately after the awk line.
HTH,
Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrushton@QinetiQ.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions
Please consider the environment before printing this email. -----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of ankush grover Sent: 30 December 2011 12:01 To: CentOS mailing list Subject: [CentOS] Need help in writing a shell/bash script
Hi Friends,
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
The desired output should be like this
admin: ankush, amit powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get the desired output. Please help me
Thanks & Regards
Ankush _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Rushton Martin wrote:
I knocked up the enclosed under Cygwin:
#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort
<snip> Why use cat? Why not just stick the filename in the command line, right after the closing ', and before the pipe?
mark
Demonstration purposes only. I wanted to show the data going is was the user's data as described. The awk script is the key, the cat and sort are merely decoration.
Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrushton@QinetiQ.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions
Please consider the environment before printing this email. -----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of m.roth@5-cent.us Sent: 30 December 2011 15:06 To: CentOS mailing list Subject: Re: [CentOS] UC Need help in writing a shell/bash script
Rushton Martin wrote:
I knocked up the enclosed under Cygwin:
#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort
<snip> Why use cat? Why not just stick the filename in the command line, right after the closing ', and before the pipe?
mark
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Hi,
On Friday, December 30, 2011 at 9:00 PM, ankush grover wrote:
Hi Friends,
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
I’m not sure I understood that “2 columns into 3rd one” there but...
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
If that’s the format of your input and …
The desired output should be like this
admin: ankush, amit powerusers: dinesh, jitendra
If that’s your desired output, and assuming the input file is already sorted, try the ff:
# -- code starts here --> #!/bin/bash
GROUPNAMENOW=''
while read LINE do GROUPNAME=$(echo $LINE | cut -d ' ' -f 1) USERNAME=$(echo $LINE | cut -d ' ' -f 2) if [ "$GROUPNAME" == "$GROUPNAMENOW" ]; then echo ", $USERNAME" else GROUPNAMENOW=$GROUPNAME echo -n "$GROUPNAMENOW: $USERNAME" fi done < input.txt
# <-- code ends here --
Note: Tested and worked as expected in OS X. It should work in CentOS too.
HTH,
-- - Edo - mailto:ml2edwin@gmail.com “Happy are those conscious of their spiritual need …” —Matthew 5:3
On 12/30/2011 09:00 PM, ankush grover wrote:
Hi Friends,
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
The desired output should be like this
admin: ankush, amit powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get the desired output. Please help me
Hi Ankush,
This will do what you want. But please read the comments in the code. As a side note, this sort of thing is way more natural in Postgres. That will become more apparent as the file contents grow. In particular, the concept of appending tens of thousands of names to a single line in a file is a little crazy, as most text editors will start choking on display without a \n in there somewhere to relieve the way most of them read and display text.
#######BEGIN collator.sh #! /bin/bash # # collator.sh # # Invocation: # If executable and in $PATH (~/bin is a good idea): # collator.sh input-filename output-filename # If not executable, not in $PATH, but in present working directory: # sh ./collator.sh input-filename output-filename # # WARNING: There is NO serious attempt at error checking implemented. # This means you should check the contents of OUTFILE before # using it for anything important.
INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."}
awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
å¤ç¥ãå²©ç· wrote:
On 12/30/2011 09:00 PM, ankush grover wrote:
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
<snip>
This will do what you want. But please read the comments in the code.
<snip>
#######BEGIN collator.sh
<snip>
INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."}
awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh
This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....
mark
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
夜神 岩男 wrote:
On 12/30/2011 09:00 PM, ankush grover wrote:
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
<snip> > This will do what you want. But please read the comments in the code. <snip> > #######BEGIN collator.sh <snip> > INFILE=${1:?"Input filename missing, please read script comments."} > OUTFILE=${2:?"Output filename missing, please read script comments."} > > awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE > for GROUP in `cat $OUTFILE | cut -d ':' -f 1` > do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` > do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE > done > done > #######END collator.sh
This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Craig White wrote:
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
å¤ç¥ãå²©ç· wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
<MVNCH>
mark
On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:
Craig White wrote:
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
夜神 岩男 wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
<MVNCH>
mark
---- let me see if I get this straight... you are objecting to him using his real name?
Craig
On 12/31/2011 01:56 AM, Craig White wrote:
On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:
Craig White wrote:
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
夜神 岩男 wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
<MVNCH>
mark
let me see if I get this straight... you are objecting to him using his real name?
Craig
Its ok, I'm totally about to change my private email address header for one guy on one mailing list. And anyway, shame on me for trying to help someone on a list with a quick script. What was I thinking!
å¤ç¥ãå²©ç· wrote:
On 12/31/2011 01:56 AM, Craig White wrote:
On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:
Craig White wrote:
looked like English to me... On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
å¤Å神ãâ¬â¬Ã¥Â²Â©Ã§â· wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
<MVNCH>
Its ok, I'm totally about to change my private email address header for one guy on one mailing list. And anyway, shame on me for trying to help someone on a list with a quick script. What was I thinking!
*shrug* Fine, so if I want to address you, in response to a post, I'll just use SGP for supergiantpotato. Whatever floats your boat.
mark
On Fri, Dec 30, 2011 at 11:52:21AM -0500, m.roth@5-cent.us wrote:
Craig White wrote:
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
å¤ç¥ãå²©ç· wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
I speak Japanese, so didn't even notice. At any rate, it's the poster's name--interestingly, when I hit reply here, it does come out with various odd symbols rather than the name.
I understand Mark's point but in this case, I don't really think it's fair to ask someone to change their name specifically for this list. (That is, tell them, You can't write your name in your own language).
And no, my wife isn't watching over my shoulder as I watch this, [1] she just has me well trained about we English centric Americans. :)
[1] I can think of at least one list member here who will automatically assume that's why I wrote this. :)
On Dec 30, 2011, at 10:19 AM, Scott Robbins wrote:
On Fri, Dec 30, 2011 at 11:52:21AM -0500, m.roth@5-cent.us wrote:
Craig White wrote:
looked like English to me...
On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
å¤ç¥ãå²©ç· wrote:
^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.
I speak Japanese, so didn't even notice. At any rate, it's the poster's name--interestingly, when I hit reply here, it does come out with various odd symbols rather than the name.
---- not for me it doesn't... On Dec 30, 2011, at 10:10 AM, 夜神 岩男 wrote:
but it obviously depends upon the mail client and languages that are available to be used by the mail client you are using at the moment. ----
I understand Mark's point but in this case, I don't really think it's fair to ask someone to change their name specifically for this list. (That is, tell them, You can't write your name in your own language).
---- absolutely absurd but you only need to consider the source.
Craig
On 12/31/2011 01:41 AM, m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
Thanks for the info
This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....
I tried it before writing that. It starts printing names on newlines after the second name in a group. Not so good. It also has variable output when the group names are not sorted prior to input. Etc. Given that, I'd say it is more fragile than what I wrote. But whatev. Let the OP decide which one is more useful.
Easy to fix, yes.
And perhaps you don't like awk being used that way. Fine. It can be substituted -- but awk is an old habit of mine.
The whole script could have been written in just one or two blazingly complex sed commands... but that sucks even more for the OP if he has to debug it later...
On Friday 30 December 2011 11:41:47 m.roth@5-cent.us wrote:
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),
You are not using "plain text" and "unicode" correctly here.
I've read pleasantly his emails in *plain text* encoded in *ASCI*. Only his name is in UTF-8 encoding (which still *is* plain text).
From the email headers: Content-Type: text/plain; charset="us-ascii"
My email client can do *plain text* in ASCI encoding as well as in UTF-8
It seems yours cannot
Regards
On Fri, Dec 30, 2011 at 6:00 AM, ankush grover ankushcentos@gmail.com wrote:
Hi Friends,
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra
The desired output should be like this
admin: ankush, amit powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get the desired output. Please help me
Here's a perl approach:
#!/usr/bin/perl
my ($group,$name); my %groups=(); while (<>) { chomp(); ($group,$name) = split(/ /); push @{ $groups{$group} }, $name; } foreach $group (sort keys(%groups)) { print "$group: " . join("," , @{$groups{$group}}) ."\n"; }
Cat or redirect the list to the program input, output is on stdout.
On 12/30/11 9:58 AM, Les Mikesell wrote:
Here's a perl approach:
which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.
John R Pierce wrote:
On 12/30/11 9:58 AM, Les Mikesell wrote:
Here's a perl approach:
which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.
IIRC, the awk will come out in order, given the hash.
mark
Thanks supergiantpotato and Edo. Scripts worked for me.
Thanks a lot :)
On Sat, Dec 31, 2011 at 12:45 AM, m.roth@5-cent.us wrote:
John R Pierce wrote:
On 12/30/11 9:58 AM, Les Mikesell wrote:
Here's a perl approach:
which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.
IIRC, the awk will come out in order, given the hash.
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos