Need help in writing a shell/bash script

List overview All Threads
Download

newer

older

CentOS-announce Digest, Vol 83,...

fcron scheduler

ankush grover

30 Dec 2011 30 Dec '11

5:30 p.m.

Hi Friends,

I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

The desired output should be like this

admin: ankush, amit powerusers: dinesh, jitendra

There are commands available but not able to use it properly to get the desired output. Please help me

Thanks & Regards

Ankush

Show replies by date

Rushton Martin

30 Dec 30 Dec

6:41 p.m.

New subject: UC Need help in writing a shell/bash script

I knocked up the enclosed under Cygwin:

#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort

The meat is the AWK programme. If collects all instances of the second column in an array indexed on the entries in the first column. At the end of the input file it handles each element of the array in turn, dropping the grammatically incorrect leading comma and space. The sort just sorts lines alphabetically, as you implied. The ( cat ... ) | construct is just to push in your test data.

Are the headings part of the file? In which case you may need to add a line:

NR == 1 { next }

immediately after the awk line.

HTH,

Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrushton@QinetiQ.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email. -----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of ankush grover Sent: 30 December 2011 12:01 To: CentOS mailing list Subject: [CentOS] Need help in writing a shell/bash script

Hi Friends,

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

The desired output should be like this

admin: ankush, amit powerusers: dinesh, jitendra

There are commands available but not able to use it properly to get the desired output. Please help me

Thanks & Regards

Ankush _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.

m.roth＠5-cent.us

8:35 p.m.

New subject: UC Need help in writing a shell/bash script

Rushton Martin wrote:

...

I knocked up the enclosed under Cygwin:

#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort

<snip> Why use cat? Why not just stick the filename in the command line, right after the closing ', and before the pipe?

mark

Rushton Martin

8:44 p.m.

New subject: UC UC Need help in writing a shell/bash script

Demonstration purposes only. I wanted to show the data going is was the user's data as described. The awk script is the key, the cat and sort are merely decoration.

Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrushton@QinetiQ.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email. -----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of m.roth@5-cent.us Sent: 30 December 2011 15:06 To: CentOS mailing list Subject: Re: [CentOS] UC Need help in writing a shell/bash script

Rushton Martin wrote:

...

I knocked up the enclosed under Cygwin:

#!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort

<snip> Why use cat? Why not just stick the filename in the command line, right after the closing ', and before the pipe?

mark

_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.

Edo

7:45 p.m.

Hi,

On Friday, December 30, 2011 at 9:00 PM, ankush grover wrote:

...

Hi Friends,

I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

I’m not sure I understood that “2 columns into 3rd one” there but...

...

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

If that’s the format of your input and …

...

The desired output should be like this

admin: ankush, amit powerusers: dinesh, jitendra

If that’s your desired output, and assuming the input file is already sorted, try the ff:

# -- code starts here --> #!/bin/bash

GROUPNAMENOW=''

while read LINE do GROUPNAME=$(echo $LINE | cut -d ' ' -f 1) USERNAME=$(echo $LINE | cut -d ' ' -f 2) if [ "$GROUPNAME" == "$GROUPNAMENOW" ]; then echo ", $USERNAME" else GROUPNAMENOW=$GROUPNAME echo -n "$GROUPNAMENOW: $USERNAME" fi done < input.txt

# <-- code ends here --

Note: Tested and worked as expected in OS X. It should work in CentOS too.

HTH,

-- - Edo - mailto:ml2edwin@gmail.com “Happy are those conscious of their spiritual need …” —Matthew 5:3

夜神　岩男

9:34 p.m.

On 12/30/2011 09:00 PM, ankush grover wrote:

...

Hi Friends,

I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

The desired output should be like this

admin: ankush, amit powerusers: dinesh, jitendra

There are commands available but not able to use it properly to get the desired output. Please help me

Hi Ankush,

This will do what you want. But please read the comments in the code. As a side note, this sort of thing is way more natural in Postgres. That will become more apparent as the file contents grow. In particular, the concept of appending tens of thousands of names to a single line in a file is a little crazy, as most text editors will start choking on display without a \n in there somewhere to relieve the way most of them read and display text.

#######BEGIN collator.sh #! /bin/bash # # collator.sh # # Invocation: # If executable and in $PATH (~/bin is a good idea): # collator.sh input-filename output-filename # If not executable, not in $PATH, but in present working directory: # sh ./collator.sh input-filename output-filename # # WARNING: There is NO serious attempt at error checking implemented. # This means you should check the contents of OUTFILE before # using it for anything important.

INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."}

awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh

m.roth＠5-cent.us

10:11 p.m.

Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤ç¥ãå²©ç· wrote:

...

On 12/30/2011 09:00 PM, ankush grover wrote:

...
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

<snip>

...

This will do what you want. But please read the comments in the code.

<snip>

...

#######BEGIN collator.sh

<snip>

...

INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."}

awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh

This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....

mark

Craig White

10:20 p.m.

looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...

Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤œç¥žã€€å²©ç”· wrote:

...
On 12/30/2011 09:00 PM, ankush grover wrote:

...
I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

<snip> > This will do what you want. But please read the comments in the code. <snip> > #######BEGIN collator.sh <snip> > INFILE=${1:?"Input filename missing, please read script comments."} > OUTFILE=${2:?"Output filename missing, please read script comments."} > > awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE > for GROUP in `cat $OUTFILE | cut -d ':' -f 1` > do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` > do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE > done > done > #######END collator.sh

This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....
  mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- Craig White ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ craig.white@ttiltd.com 1.800.869.6908 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ www.ttiassessments.com Need help communicating between generations at work to achieve your desired success? Let us help!

m.roth＠5-cent.us

10:22 p.m.

Craig White wrote:

...

looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤ç¥ãå²©ç· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

<MVNCH>

mark

Craig White

10:26 p.m.

On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:

...

Craig White wrote:

...
looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤œç¥žã€€å²©ç”· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

<MVNCH>
   mark

---- let me see if I get this straight... you are objecting to him using his real name?

Craig

夜神　岩男

10:40 p.m.

On 12/31/2011 01:56 AM, Craig White wrote:

...

On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:

...
Craig White wrote:

...
looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤œç¥žã€€å²©ç”· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

<MVNCH>
    mark
let me see if I get this straight... you are objecting to him using his real name?

Craig

Its ok, I'm totally about to change my private email address header for one guy on one mailing list. And anyway, shame on me for trying to help someone on a list with a quick script. What was I thinking!

m.roth＠5-cent.us

11:26 p.m.

å¤ç¥ãå²©ç· wrote:

...

On 12/31/2011 01:56 AM, Craig White wrote:

...
On Dec 30, 2011, at 9:52 AM, m.roth@5-cent.us wrote:

...
Craig White wrote:

...
looked like English to me... On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

Ã¥Â¤ÅÃ§Â¥Å¾Ã£â¬â¬Ã¥Â²Â©Ã§âÂ· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

<MVNCH>

...

Its ok, I'm totally about to change my private email address header for one guy on one mailing list. And anyway, shame on me for trying to help someone on a list with a quick script. What was I thinking!

*shrug* Fine, so if I want to address you, in response to a post, I'll just use SGP for supergiantpotato. Whatever floats your boat.

mark

Scott Robbins

10:49 p.m.

On Fri, Dec 30, 2011 at 11:52:21AM -0500, m.roth@5-cent.us wrote:

...

Craig White wrote:

...

...
looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤ç¥ãå²©ç· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

I speak Japanese, so didn't even notice. At any rate, it's the poster's name--interestingly, when I hit reply here, it does come out with various odd symbols rather than the name.

I understand Mark's point but in this case, I don't really think it's fair to ask someone to change their name specifically for this list. (That is, tell them, You can't write your name in your own language).

And no, my wife isn't watching over my shoulder as I watch this, [1] she just has me well trained about we English centric Americans. :)

[1] I can think of at least one list member here who will automatically assume that's why I wrote this. :)

-- Scott Robbins PGP keyID EB3467D6 ( 1B48 077D 66F6 9DB0 FDC2 A409 FA54 EB34 67D6 ) gpg --keyserver pgp.mit.edu --recv-keys EB3467D6 Angel: You've never done this before. Look, it takes tremendous strength -- mental strength. Wesley: Resistence to suggestion. Yes, I understand that. I like to think of myself as possessing a certain... Angel: Wesley, you don't even have sales resistance. How many thigh masters do you own? Wesley: The second one was a free gift with my Buns of Steel.

Craig White

11:05 p.m.

On Dec 30, 2011, at 10:19 AM, Scott Robbins wrote:

...

On Fri, Dec 30, 2011 at 11:52:21AM -0500, m.roth@5-cent.us wrote:

...
Craig White wrote:

...
...
looked like English to me...

On Dec 30, 2011, at 9:41 AM, m.roth@5-cent.us wrote:

...
Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

å¤ç¥ãå²©ç· wrote:

^^^^^^^^^^^^^^ doesn't look like English, or ASCII, to me.

I speak Japanese, so didn't even notice. At any rate, it's the poster's name--interestingly, when I hit reply here, it does come out with various odd symbols rather than the name.

---- not for me it doesn't... On Dec 30, 2011, at 10:10 AM, 夜神　岩男 wrote:

but it obviously depends upon the mail client and languages that are available to be used by the mail client you are using at the moment. ----

...

I understand Mark's point but in this case, I don't really think it's fair to ask someone to change their name specifically for this list. (That is, tell them, You can't write your name in your own language).

---- absolutely absurd but you only need to consider the source.

Craig

夜神　岩男

10:28 p.m.

On 12/31/2011 01:41 AM, m.roth@5-cent.us wrote:

...

Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

Thanks for the info

...

This is really complicated and fiddly. Look at the one awk script that was posted, which is *far* simpler, and uses awk the way it's intended to be used, not as a replacement for cut....

I tried it before writing that. It starts printing names on newlines after the second name in a group. Not so good. It also has variable output when the group names are not sorted prior to input. Etc. Given that, I'd say it is more fragile than what I wrote. But whatev. Let the OP decide which one is more useful.

Easy to fix, yes.

And perhaps you don't like awk being used that way. Fine. It can be substituted -- but awk is an old habit of mine.

The whole script could have been written in just one or two blazingly complex sed commands... but that sucks even more for the OP if he has to debug it later...

Marc Deop

11:18 p.m.

On Friday 30 December 2011 11:41:47 m.roth@5-cent.us wrote:

...

Hey, supergiantpotato (and btw, this list is plain text, not unicode, and most of us don't read Japanese...),

You are not using "plain text" and "unicode" correctly here.

I've read pleasantly his emails in *plain text* encoded in *ASCI*. Only his name is in UTF-8 encoding (which still *is* plain text).

...

From the email headers: Content-Type: text/plain; charset="us-ascii"

My email client can do *plain text* in ASCI encoding as well as in UTF-8

It seems yours cannot

Regards

Les Mikesell

11:28 p.m.

On Fri, Dec 30, 2011 at 6:00 AM, ankush grover ankushcentos@gmail.com wrote:

...

Hi Friends,

I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name.

1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra

The desired output should be like this

admin: ankush, amit powerusers: dinesh, jitendra

There are commands available but not able to use it properly to get the desired output. Please help me

Here's a perl approach:

#!/usr/bin/perl

my ($group,$name); my %groups=(); while (<>) { chomp(); ($group,$name) = split(/ /); push @{ $groups{$group} }, $name; } foreach $group (sort keys(%groups)) { print "$group: " . join("," , @{$groups{$group}}) ."\n"; }

Cat or redirect the list to the program input, output is on stdout.

-- Les Mikesell lesmikesell@gmail.com

John R Pierce

31 Dec 31 Dec

12:31 a.m.

On 12/30/11 9:58 AM, Les Mikesell wrote:

...

Here's a perl approach:

which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

m.roth＠5-cent.us

12:45 a.m.

John R Pierce wrote:

...

On 12/30/11 9:58 AM, Les Mikesell wrote:

...
Here's a perl approach:

which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.

IIRC, the awk will come out in order, given the hash.

mark

ankush grover

7 Jan 7 Jan

5:24 p.m.

Thanks supergiantpotato and Edo. Scripts worked for me.

Thanks a lot :)

On Sat, Dec 31, 2011 at 12:45 AM, m.roth@5-cent.us wrote:

...

John R Pierce wrote:

...
On 12/30/11 9:58 AM, Les Mikesell wrote:

...
Here's a perl approach:

which, unlike all the other versions, doesn't require the data be pre-sorted, by virtue of adding all the tuples to a hash. I don't even think that sort in the output loop is required, unless you want the groups output in alphabetic order.

IIRC, the awk will come out in order, given the hash.

mark

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

4941

Age (days ago)

4949

Last active (days ago)

discuss@lists.centos.org

19 comments

10 participants

tags (0)

participants (10)

ankush grover
Craig White
Edo
John R Pierce
Les Mikesell
m.roth＠5-cent.us
Marc Deop
Rushton Martin
Scott Robbins
夜神　岩男