[CentOS] Squid and HTTPS interception on CentOS 7 ?

> Am 05.03.2018 um 15:34 schrieb Bill Gee <bgee at campercaver.net>:
> 
> 
> On Monday, March 5, 2018 7:23:53 AM CST Leon Fauster wrote:
>> Am 05.03.2018 um 13:04 schrieb Nicolas Kovacs <info at microlinux.fr>:
>>> Le 28/02/2018 à 22:23, Nicolas Kovacs a écrit :
>>>> So far, I've only been able to filter HTTP.
>>>> 
>>>> Do any of you do transparent HTTPS filtering ? Any suggestions,
>>>> advice, caveats, do's and don'ts ?
>>> 
>>> After a week of trial and error, transparent HTTPS filtering works
>>> perfectly. I wrote a detailed blog article about it.
>>> 
>>> https://blog.microlinux.fr/squid-https-centos/
>> 
>> I wonder if this works with all https enabled sites? Chrome has
>> capabilities hardcoded to check google certificates. Certificate
>> Transparency, HTTP Public Key Pinning, CAA DNS are also supporting
>> the end node to identify MITM. I hope that such setup will be unpractical
>> in the near future.
>> 
>> About your legal requirements; Weighing is what courts daily do. So,
>> such requirements are not asking you to destroy the integrity and
>> confidentiality >95% of users activity. Blocking Routing, DNS, IPs,
>> Ports are the way to go.
>> 
>> --
>> LF
> 
> Although not really related to CentOS, I do have some thoughts on this.  I 
> used to work in the IT department of a public library.  One of the big 
> considerations at a library is patron privacy.  We went to great lengths to 
> NOT record what web sites were visited by our patrons.  We also deny requests 
> from anyone to find out what books a patron has checked out.  
> 
> The library is required by law to provide web filtering, mainly because we 
> have public-use computers which are used by children.  For http this is easy.  
> Https is, as this discussion reveals, a different animal.
> 
> We started to set up a filter which would run directly on our router (Juniper 
> SRX-series) using EWF software.  It quickly became apparent that any kind of 
> https filtering requires a MITM attack.  We were basically decrypting the 
> patron's web traffic on our router, then encrypting it again with a different 
> cert.  
> 
> When we realized what it would take, we had a HUGE internal discussion about 
> how to proceed.  Yeah, the lawyers were all over it!  In the end we decided to 
> not attempt to filter https traffic except by whatever was not encrypted.  
> Basically that means web site names.
> 
> Our test case was the Playboy web site.  They are available on https, but they 
> do not automatically redirect http to https.  If you open playboy [dot] com 
> with no protocol specified, it goes over http.  Our existing filter blocked 
> that.  However, if you open https[colon]// playboy [dot] com, it goes straight 
> in.  The traffic never goes over http, so the filter on the router never 
> processes it.
> 
> Security by obscurity ...  It was the best we could do without violating our 
> own policies on patron privacy.

All browsers sent "server_name" [*] in there https requests. That is the domain part of 
the URI. So, you can identify the requested https site without decrypting (because its 
"lets call it a header" that includes this information) and without damaging the privacy.

[*] https://tools.ietf.org/html/rfc6066

--
LF