To be more specific, I need to find how many distinct records are there in say column#1?
awk '{print $1}' filename | sort -u | wc -l
This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)
How can I filter out the distinct records with number of occurances less than a pre-determined threshold?
I don't quite understand this part.
awk '{print $1}' filename | sort | uniq -c | sort -rn
Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.
Now I think you want to put that through a loop and only show those that are less than threshold?
Thanks Sheraz
------Original Message------ From: sheraznaz@yahoo.com Sender: centos-bounces@centos.org To: CentOS mailing list ReplyTo: CentOS mailing list Subject: Re: [CentOS] Text file manipulation in CentOS? Sent: May 11, 2010 1:14 AM
Can you sample input and expected result.
Sent from my Verizon Wireless BlackBerry
-----Original Message----- From: hadi motamedi motamedi24@gmail.com Date: Tue, 11 May 2010 09:09:23 To: CentOS mailing listcentos@centos.org Subject: [CentOS] Text file manipulation in CentOS?
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Sent from my Verizon Wireless BlackBerry
I don't quite understand this part.
Thank you very much for your reply.Please find below a segment of the
file: CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0) CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (6 0x0f) E1 (7 0 15) Tru (0 0 2) CallId 94 State TK Bts 7 Bt 1 Tr (6 0x0f) E1 (7 0 15) Tru (0 0 2) CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) CallId 92 State IH Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1 22) Tru (0 0 0) CallId 92 State IH Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1 22) Tru (0 0 0) CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) Your first comment on using 'awk' enabled me to find how many distinct 'CallId' exists in my log. For the second part, please imagine that I need to filter out that 'CallId' that have occured for say less than three times.Please help me on accomplishing the second part. Thank you
On Tue, May 11, 2010 at 5:51 AM, hadi motamedi motamedi24@gmail.com wrote:
I don't quite understand this part.
Thank you very much for your reply.Please find below a segment of the file:
If you give the following command:
sort YOUR_FILE | uniq -c | sort -n | perl -ne 'print unless /(\d+)/ and $1 < 3'
where YOUR_FILE's contents are exactly the lines you pasted earler you will get:
3 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 4 CallId 92 State TK Bts 7 Bt 1 Tr (7 0x08) E1 (3 1 22) Tru (0 0 0) 5 CallId 94 State TK Bts 7 Bt 1 Tr (8 0x0c) E1 (7 0 15) Tru (0 0 2) 7 CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0)
The first number is the number of occurrences of each CallId Does this help?
Does this help?
The first number is the number of occurrences of each CallId
Thank you for your help. It is very important for me to how the number of occurances of each CallId# . But can you please let me know why the number obtained from your code does not match with manual counting on say one of the CallId#? Can you please correct me?
On Tue, May 11, 2010 at 8:12 AM, hadi motamedi motamedi24@gmail.com wrote:
Does this help? The first number is the number of occurrences of each CallId
Thank you for your help. It is very important for me to how the number of occurances of each CallId# . But can you please let me know why the number obtained from your code does not match with manual counting on say one of the CallId#? Can you please correct me?
Oh, that's because uniq thinks that two lines are different if your characters TK,CL... and the rest of the line are different. If you want to count lines only by the number following CallId you should tell uniq to compare only the first characters in the line:
$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless /(\d+)/ and $1 < 3' 4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1 22) Tru (0 0 0) 7 CallId 94 State TK Bts 7 Bt 1 Tr (6 0x0f) E1 (7 0 15) Tru (0 0 2) 7 CallId 9 State TK Bts 7 Bt 2 Tr (13 0x09) E1 (4 1 5) Tru (0 3 0)
(note -w 9).
$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless
/(\d+)/ and $1 < 3' 4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1
Thank you for your reply. To just have one 'State' for the CallId , I created one new logfile as the following: #more logfile1 | grep "State TK" >> logfile2 Then in the logfile2 , I tried to count the number of occurances of each distinct CallId with the aid of your proposed command . But in the output, I see differences between the number obtained from counting them manually with the one generated from your command. Can you please correct me?
On Wed, May 12, 2010 at 1:12 AM, hadi motamedi motamedi24@gmail.com wrote:
$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless /(\d+)/ and $1 < 3' 4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1
Thank you for your reply. To just have one 'State' for the CallId , I created one new logfile as the following: #more logfile1 | grep "State TK" >> logfile2 Then in the logfile2 , I tried to count the number of occurances of each distinct CallId with the aid of your proposed command . But in the output, I see differences between the number obtained from counting them manually with the one generated from your command. Can you please correct me?
Please enclose a copy of your commands and the output.
On Wed, May 12, 2010 at 05:12:48AM +0100, hadi motamedi wrote:
$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless
/(\d+)/ and $1 < 3' 4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1
Thank you for your reply. To just have one 'State' for the CallId , I created one new logfile as the following: #more logfile1 | grep "State TK" >> logfile2 Then in the logfile2 , I tried to count the number of occurances of each distinct CallId with the aid of your proposed command . But in the output, I see differences between the number obtained from counting them manually with the one generated from your command. Can you please correct me?
I'm likely to get in trouble for this, but frankly I don't really care.
This list doesn't exist to do *your* job for you. We are not here to do *your* work. In the past few months you've done nothing but use the members of this list as your personal "please come do my job for me" group because you choose not to do any research or learning on your own. While members of the list are quite happy to help people, you're taking advantage of their kindness and patience. Why should you be paid or earn class credit based on our expertise? It is not clear whether you are a paid IT person, consultant or just a student learning about the IT field. But it is also not relevant, as you are just depending on us to do your work.
There are a bazillion resources on the web, starting with google, that will help you learn *basic* shell scripting as is needed to solve your most current issue. There exist *many* excellent books on shell scripting; there is also "man bash"; "man awk"; "man cut"; "man sed"; etc. READ THEM.
Have you taken the time to make use of any of these resources? Have you decided to resort to this list every time something basic is needed that you refuse to take the time to learn so you are able to put together solutions yourself?
DO YOUR OWN RESEARCH ONCE IN A WHILE.
You might be amazed at what you can learn when you do so.
At some point you *will* be in a position where you have a task that needs to be done and you will not have this list to fall back on. What are you going to do then? Cry to your boss or your professor that you can't do it because all the people that have been doing your work for you up to that point aren't available?
If you were working for me I'd terminate you for not making any effort on your own. If you were a student of mine I would fail you for not expending any effort at learning the material. If you were a consultant I'd make sure you never worked for any company I was a part of and also blacklist you on top of it.
Really, enough is just enough.
Enough is also enough with the spoon-feeding. I realize that most of you are kindhearted souls that delight in helping people and that is, in most cases, to be commended. But this never-ending spoon-feeding doing Hadi's job for him is not helping him in the least. Make him stand on his own two feet for a change. It is *obvious* he has made no attempt at resolving the problems he has brought to this list on his own and it has been obvious since his first post. And yes, I know, we all started somewhere, blah, blah, blah. While this is indeed true it is *also* true that we spent the time to learn what we needed in order to do our jobs, including the most important of all: how to find the information we need which in this day and age is google or some other search engine.
(http://stuff.gerdesas.com/images/spoon.png)
To any on the list *other* than Hadi that I've offended by this post you have my most sincere apologies. Sorry for wasting your time but this has been building up for a long time.
I'm likely to get in trouble for this, but frankly I don't
really care.
Sorry. I just provided the data that the Gentlemen were asking me . I
thought that they are interested in my case and want to check out my mistakes. Sorry bothering you
John R. Dennison wrote:
On Wed, May 12, 2010 at 05:12:48AM +0100, hadi motamedi wrote:
$ cat hadi | sort | uniq -c -w 9 | sort -n | perl -ne 'print unless
/(\d+)/ and $1 < 3' 4 CallId 91 State TK Bts 5 Bt 1 Tr (4 0x0f) E1 (4 0 18) Tru (0 1 1) 7 CallId 92 State CL Bts 7 Bt 1 Tr (6 0x0a) E1 (3 1
Thank you for your reply. To just have one 'State' for the CallId , I created one new logfile as the following: #more logfile1 | grep "State TK" >> logfile2 Then in the logfile2 , I tried to count the number of occurances of each distinct CallId with the aid of your proposed command . But in the output, I see differences between the number obtained from counting them manually with the one generated from your command. Can you please correct me?
I'm likely to get in trouble for this, but frankly I don't really care.
This list doesn't exist to do *your* job for you. We are not here to do *your* work. In the past few months you've done nothing but use the members of this list as your personal "please come do my job for me" group because you choose not to do any research or learning on your own. While members of the list are quite happy to help people, you're taking advantage of their kindness and patience. Why should you be paid or earn class credit based on our expertise? It is not clear whether you are a paid IT person, consultant or just a student learning about the IT field. But it is also not relevant, as you are just depending on us to do your work.
There are a bazillion resources on the web, starting with google, that will help you learn *basic* shell scripting as is needed to solve your most current issue. There exist *many* excellent books on shell scripting; there is also "man bash"; "man awk"; "man cut"; "man sed"; etc. READ THEM.
Have you taken the time to make use of any of these resources? Have you decided to resort to this list every time something basic is needed that you refuse to take the time to learn so you are able to put together solutions yourself?
DO YOUR OWN RESEARCH ONCE IN A WHILE.
You might be amazed at what you can learn when you do so.
At some point you *will* be in a position where you have a task that needs to be done and you will not have this list to fall back on. What are you going to do then? Cry to your boss or your professor that you can't do it because all the people that have been doing your work for you up to that point aren't available?
If you were working for me I'd terminate you for not making any effort on your own. If you were a student of mine I would fail you for not expending any effort at learning the material. If you were a consultant I'd make sure you never worked for any company I was a part of and also blacklist you on top of it.
Really, enough is just enough.
Enough is also enough with the spoon-feeding. I realize that most of you are kindhearted souls that delight in helping people and that is, in most cases, to be commended. But this never-ending spoon-feeding doing Hadi's job for him is not helping him in the least. Make him stand on his own two feet for a change. It is *obvious* he has made no attempt at resolving the problems he has brought to this list on his own and it has been obvious since his first post. And yes, I know, we all started somewhere, blah, blah, blah. While this is indeed true it is *also* true that we spent the time to learn what we needed in order to do our jobs, including the most important of all: how to find the information we need which in this day and age is google or some other search engine.
(http://stuff.gerdesas.com/images/spoon.png)
To any on the list *other* than Hadi that I've offended by this post you have my most sincere apologies. Sorry for wasting your time but this has been building up for a long time.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
+1
ChrisG
On 5/11/2010 11:48 PM, John R. Dennison wrote:
To any on the list *other* than Hadi that I've offended by this post you have my most sincere apologies. Sorry for wasting your time but this has been building up for a long time.
You sound like someone who learned the unix toolset back when man(1) had a hundred entries or so and you could flip through the whole printed version in an afternoon memorizing the short mnemonic names and their uses. I suspect it is a lot harder to figure out now that there are many thousands of programs to pick from and a search for anything will have millions of irrelevant hits. Of course you could just learn perl and be done with it since it can do pretty much anything you'd use the shell and tools for.
On Wed, May 12, 2010 at 13:14, Les Mikesell lesmikesell@gmail.com wrote:
On 5/11/2010 11:48 PM, John R. Dennison wrote:
To any on the list *other* than Hadi that I've offended by this post you have my most sincere apologies. Sorry for wasting your time but this has been building up for a long time.
You sound like someone who learned the unix toolset back when man(1) had a hundred entries or so and you could flip through the whole printed version in an afternoon memorizing the short mnemonic names and their uses. I suspect it is a lot harder to figure out now that there are many thousands of programs to pick from and a search for anything will have millions of irrelevant hits. Of course you could just learn perl and be done with it since it can do pretty much anything you'd use the shell and tools for.
-- Les Mikesell lesmikesell@gmail.com
Almost any introductory book on Linux/UNIX that covers the standard command line utilities (sed, awk, greg, egrep, tr, cut, etc) could have answered the questions he had. Of course, perl, python, ruby, (even tcl!), bash could do all of these things. Now there are ebooks, tutorials, etc that are free that could cover this. Now, it is daunting, sure, but it could be an adventure too. Depends on one's frame of mind. Perl's slogan: There's more than one way to do it. oh, I forgot, emacs macros can do this too. Don't know of any gui tools that are worth having. Too bad we have so many millions of people crippled by Microslop :-( I was there once. Learned by lots of experimentation and lots of reading and lots of questions. Many times the questions were answered by something to the effect of here's a fishing pole, go catch your own fish (or go to the library and get a book on how fish, then make a fishing pole, then go fishing).
IMNSHO, Ken Wolcott
On 5/12/2010 3:53 PM, Kenneth Wolcott wrote:
Almost any introductory book on Linux/UNIX that covers the standard command line utilities (sed, awk, greg, egrep, tr, cut, etc) could have answered the questions he had.
I don't think you've actually looked at current introductory books. Everyone tries to combine the tutorial with the reference these days and ends up with something that doesn't quite work for either purpose. And none deal with the fact that you have to understand what the shell is going to do with your command line before you will be very good at understandin a man page for any other tool.
Perl's slogan: There's more than one way to do it.
Yes, but if you start wrong you'll probably end up wrong.
oh, I forgot, emacs macros can do this too. Don't know of any gui tools that are worth having.
Try eclipse sometime.
On Wed, May 12, 2010 at 14:17, Les Mikesell lesmikesell@gmail.com wrote:
On 5/12/2010 3:53 PM, Kenneth Wolcott wrote:
Almost any introductory book on Linux/UNIX that covers the standard command line utilities (sed, awk, greg, egrep, tr, cut, etc) could have answered the questions he had.
I don't think you've actually looked at current introductory books. Everyone tries to combine the tutorial with the reference these days and ends up with something that doesn't quite work for either purpose. And none deal with the fact that you have to understand what the shell is going to do with your command line before you will be very good at understandin a man page for any other tool.
Perl's slogan: There's more than one way to do it.
Yes, but if you start wrong you'll probably end up wrong.
Possible. But hand-holding can only go so far.
oh, I forgot, emacs macros can do this too. Don't know of any gui tools that are worth having.
Try eclipse sometime.
For mangling text??!! I think your example is way off topic for this thread. Or for programming in a large environment? Possible, but not pertinent to this email thread.
Ken
For mangling text??!! I think your example is way off topic for this
Thank you for your reply. I thought to write C code to accomplish this but next I found very powerful centos tools for this application from the help of you Gentlemen.
On Thu, May 13, 2010 at 04:59:06AM +0100, hadi motamedi wrote:
For mangling text??!! I think your example is way off topic for this
Thank you for your reply. I thought to write C code to accomplish this but next I found very powerful centos tools for this application from the help of you Gentlemen.
Those tools are not centos only.
On Tue, May 11, 2010 at 08:25:43AM +0000, sheraznaz@yahoo.com wrote:
To be more specific, I need to find how many distinct records are there in say column#1?
awk '{print $1}' filename | sort -u | wc -l
This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)
How can I filter out the distinct records with number of occurances less than a pre-determined threshold?
I don't quite understand this part.
awk '{print $1}' filename | sort | uniq -c | sort -rn
Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.
Now I think you want to put that through a loop and only show those that are less than threshold?
If I understand correctly, you can pipe your output to: `awk '{a=$1} {if (a > 3) print a}''. `a' is awk variable. `$1' is first column of awk input so you probably need to change it.
If I understand correctly, you can pipe your output to: `awk '{a=$1} {if (a > 3) print a}''. `a' is awk variable. `$1' is first column of awk input so you probably need to change it.
Thank you for your message . Yes , you are right . I really need to filter
out that CallId with number of occurances say less than three. But your command is not getting through on my centos . Please correct me.
On Wed, May 12, 2010 at 05:20:58AM +0100, hadi motamedi wrote:
If I understand correctly, you can pipe your output to: `awk '{a=$1} {if (a > 3) print a}''. `a' is awk variable. `$1' is first column of awk input so you probably need to change it.
Thank you for your message . Yes , you are right . I really need to filter out that CallId with number of occurances say less than three. But your command is not getting through on my centos . Please correct me.
So, read `man awk', `man sed' etc, as John R. Dennison wrote. Also perl would be excelent for this kind of stuff.