[CentOS] e-mail serving
R P Herrold
herrold at owlriver.com
Wed Aug 3 18:55:18 UTC 2011
On Wed, 3 Aug 2011, Always Learning wrote:
> On Wed, 2011-08-03 at 11:03 -0700, Todd wrote:
>
>> indeed no, but I want to work on some pattern matching, analysis for a
>> piece of software I have wanted to write for years..
>
> Lots of success and good luck. Do let us know how it goes.
umm -- high speed, automated harvesting of email and running
regex against the corpus to yield say, a list of currently
live addresses seems to fit the problem description. Why
would you wish the creation of a yet another such spammer
tool, good luck? ;)
That said, procmail can do such trivially, and single pass
filtering a million pieces a day is trivial, but the bandwidth
to get it to a single machine is rather high for a residential
link ... trivial in a colo
let's do some science:
>From my mailspool, I have 6124 pieces taking up 139,083,522
bytes just now
[herrold at centos-5 ~]$ echo "( 139083522 / 6124 ) " | bc
22711
so 22k bytes per piece x 1 million ~= 22 G per day
86400 seconds in a day, on the simplifying assumption that one
has a level steady state load (which could be done by setting
a peripheral MX unit to handle the inload). I was handling
750k / day with a central unit and two MX satelites on RHL 7
with 200 MHz Pentiums and perhaps 64M or ram in them
[herrold at centos-5 ~]$ echo "22000000000 / 86400" | bc
254629 bytes per second
so roughly a T-1
A single Linux box on a 386 with 16M ram running RHL 4.0 a
decade ago had no problem with such loads. Getting
an efficient regex algorithm would be the choke point
-- Russ herrold
More information about the CentOS
mailing list