Hey Listee's
I am trying to write a shell script to sort and compare my blacklist for squidGuard with the nightly updates that come down in a tar ball. It should be rather simple but I'm not to grate at this. The script is to run nightly, it will download the latest blacklist tarball, un tar it and then add any new entries to the existing black list. The blacklists work by having a folder for each filtered category so the folder "db" contains the subfolders "adult", "gambling", "drugs" etc and each sub folder has two files, "domains" and "urls" (pretty self explanitory). This is how far I have gotten (I haven't tested this script yet as I haven't had a chance I have only gotten as far as writting it, this is what I have so far:
#!/bin/bash #This will be running from home directory
wget http://www.blacklistsite.com/blacklist.tar tar -cxf blacklist.tar cd BL
find ./ -type d -maxdepth 1 | while read FOLDER; do SQUIDDB="usr/local/squidGuard/db/$FOLDER" sort_db($SQUIDDB) comm -3 $SQUIDDB/domains $FOLDER/domains > $SQUIDDB/domains.missing comm -3 $SQUIDDB/urls $FOLDER/urls > $SQUIDDB/urls.missing cat $SQUIDDB/domains.missing >> $SQUIDDB/domains cat $SQUIDDB/urls.missing >> $SQUIDDB/urls rm $SQUIDDB/domains.missing rm $SQUIDDB/urls.missing sort_db($SQUIDDB) done
sort_db(){ sort -f $1/domains > $1/domains.sorted sort -f $1/urls > $1/urls.sorted rm $1/domains rm $1/urls mv $1/doamins.sorted $1/domains mv $1/urls.sorted $1/urls }
Is it obvious I'm new to this? Hehe, I would also love to hear how people would do this in a more efficient manner because obvisouly this is pretty sloppy and as I said I haven't tested it yet so it might not even run?!
Thanks, James ;)
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------
on 5-13-2009 4:21 AM James Bensley spake the following:
Hey Listee's
I am trying to write a shell script to sort and compare my blacklist for squidGuard with the nightly updates that come down in a tar ball. It should be rather simple but I'm not to grate at this. The script is to run nightly, it will download the latest blacklist tarball, un tar it and then add any new entries to the existing black list. The blacklists work by having a folder for each filtered category so the folder "db" contains the subfolders "adult", "gambling", "drugs" etc and each sub folder has two files, "domains" and "urls" (pretty self explanitory). This is how far I have gotten (I haven't tested this script yet as I haven't had a chance I have only gotten as far as writting it, this is what I have so far:
#!/bin/bash #This will be running from home directory
wget http://www.blacklistsite.com/blacklist.tar tar -cxf blacklist.tar cd BL
find ./ -type d -maxdepth 1 | while read FOLDER; do SQUIDDB="usr/local/squidGuard/db/$FOLDER" sort_db($SQUIDDB) comm -3 $SQUIDDB/domains $FOLDER/domains > $SQUIDDB/domains.missing comm -3 $SQUIDDB/urls $FOLDER/urls > $SQUIDDB/urls.missing cat $SQUIDDB/domains.missing >> $SQUIDDB/domains cat $SQUIDDB/urls.missing >> $SQUIDDB/urls rm $SQUIDDB/domains.missing rm $SQUIDDB/urls.missing sort_db($SQUIDDB) done
sort_db(){ sort -f $1/domains > $1/domains.sorted sort -f $1/urls > $1/urls.sorted rm $1/domains rm $1/urls mv $1/doamins.sorted $1/domains mv $1/urls.sorted $1/urls }
Is it obvious I'm new to this? Hehe, I would also love to hear how people would do this in a more efficient manner because obvisouly this is pretty sloppy and as I said I haven't tested it yet so it might not even run?!
Thanks, James ;)
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------
Are you looking to have a custom blacklist, or do you just want to know what changed?
to run nightly, it will download the latest blacklist tarball, un tar it and then add any new entries to the existing black list. The
if you're already going to the effort of downloading the entire blacklist every night, why not dump the old database, and just insert the newly downloaded one?
tar -cxf blacklist.tar
this will suck your computer into a vortex of doom. I recommend either creating a tarball, or extracting one, but not both at the same time. :)
In all honesty, you might be better targeting this query to squidGuard users, as this may be something they do regularly.
if you're already going to the effort of downloading the entire blacklist every night, why not dump the old database, and just insert the newly downloaded one?
Because we also add our own entries to the current blacklist so we are just adding any new entries from the nightly updates of our blacklist provides
tar -cxf blacklist.tar
this will suck your computer into a vortex of doom. I recommend either creating a tarball, or extracting one, but not both at the same time. :)
Its ok the blacklist is text so its a 10mb tarball of text. Takes about 30 seconds to download and it will take about 2 minutes for the script to run ;)
In all honesty, you might be better targeting this query to squidGuard users, as this may be something they do regularly.
Should be simple text manipulation :( none the less a good idea I will post my question there. Thanks!
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------
I have written my script but I wanted to add this on before and after the update to see the difference but all it returns are zeros? Anyone have any idea why?
#!/bin/sh
f=0 #Folder count d=0 #Domains count (one per line in each file) u=0 #Url count (one per line in each file) t=0 #Total of domains and urls x=0 #Temporary variable for calculations
find /usr/local/squidGuard/db -maxdepth 1 -type d | while read FOLDER; do f=`expr $f + 1` if [ -f $FOLDER/domains ]; then x=`wc -l $FOLDER/domains | awk '{print $1}'` d=`expr $d + 1` fi if [ -f $FOLDER/urls ]; then x=`wc -l $FOLDER/urls | awk '{print $1}'` u=`expr $u + 1` fi done
t=`expr $d + $u`
echo "Number of categories: $f" echo "Number of domains: $d" echo "Number of URLs: $u" echo "Total entries: $t" echo "$x"
This is the ouput:
[hades@hades ~]$ sh tester Number of categories: 0 Number of domains: 0 Number of URLs: 0 Total entries: 0 0 [hades@hades ~]$
Many thanks, James ;)
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------
Update: these lines should be:
+ $X
d=`expr $d + 1`
and <snip>
u=`expr $u + 1` fi done
James ;)
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------
On Thu, 14 May 2009 12:35:13 +0100 James Bensley jwbensley@gmail.com wrote:
Update: these lines should be:
- $X
that should be lower case.
My guess is that because your variables all equal zero, it's possible that something is wrong with:
find /usr/local/squidGuard/db -maxdepth 1 -type d | while read FOLDER;
stick "set -x" under your #!/bin/sh to see what's running and what's not.
On Fri, May 15, 2009 at 10:17:21AM +1200, Spiro Harvey wrote:
My guess is that because your variables all equal zero, it's possible that something is wrong with:
find /usr/local/squidGuard/db -maxdepth 1 -type d | while read FOLDER;
More likely he's using a shell that runs the "while" loop in a subshell.
What is a=bad echo good | read a echo a is a
For ksh88, ksh93, zsh it's "good"; for pdksh, bash it's "bad".
On Thu, May 14, 2009 at 1:31 AM, James Bensley jwbensley@gmail.com wrote:
if you're already going to the effort of downloading the entire blacklist every night, why not dump the old database, and just insert the newly downloaded one?
Because we also add our own entries to the current blacklist so we are just adding any new entries from the nightly updates of our blacklist provides
tar -cxf blacklist.tar
this will suck your computer into a vortex of doom. I recommend either creating a tarball, or extracting one, but not both at the same time. :)
Its ok the blacklist is text so its a 10mb tarball of text. Takes about 30 seconds to download and it will take about 2 minutes for the script to run ;)
In all honesty, you might be better targeting this query to squidGuard users, as this may be something they do regularly.
Should be simple text manipulation :( none the less a good idea I will post my question there. Thanks!
-----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------ _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Why have you custom black list separate? Then when doing updates, remove the current list, concatenate the update with the custom and then put that in place.
This would presumably make it easier for you to manage your custom stuff, but still be up to date.
Also you could try
for FOLDER in `find /usr/local/squidGuard/db -maxdepth 1 -type d`; do
instead of find /usr/local/squidGuard/db -maxdepth 1 -type d | while read FOLDER; do
-- Eric
On Tue, 2009-05-19 at 22:06 -0400, Eric Sisolak wrote:
Also you could try
for FOLDER in `find /usr/local/squidGuard/db -maxdepth 1 -type d`; do
This is a classic mistake. It has two problems: 1) The list of files created by the embedded find can exceed the maximum command length. 2) Directories with spaces in their name will be split by the tokenizer, resulting in $FOLDER containing invalid or dangerous paths.
instead of find /usr/local/squidGuard/db -maxdepth 1 -type d | while read FOLDER; do
This is the correct way to combine a shell loop with a program that creates a list of files.