[CentOS] script to make webpage snapshot

Thu Aug 11 22:13:28 UTC 2016
Dave Stevens <geek at uniserve.com>

Quoting Valeri Galtsev <galtsev at kicp.uchicago.edu>:

> On Thu, August 11, 2016 5:02 pm, John R Pierce wrote:
>> On 8/11/2016 1:46 PM, Valeri Galtsev wrote:
>>> Could someone recommend a script or utility one can run from command
>>> line
>>> on Linux or UNIX machine to make a snapshot of webpage?
>>> We have a signage (xibo) and whoever creates/changes content, likes to
>>> add
>>> URLs of some webpages there. All works well if these are webpages on our
>>> servers (which are pretty fast), but some external servers often take
>>> time
>>> to respond and take time to assemble the page, in addition these servers
>>> sometimes get really busy, and when response is longer than time devoted
>>> for that content in signage window, this window hangs forever with blank
>>> white field until you restart client. Trivial workaround: just to get
>>> snapshot (as, say daily cron job), and point signage client to that
>>> snapshot definitely will solve it, and simultaneously we will stop
>>> bugging
>>> other people servers often without much need for it.
>>> But when I tried to search for some utility or script that makes webpage
>>> snapshot, I discovered that my ability to search degraded somehow...
>> many/most webpages these days are heavily dynamic content, a static
>> snapshot would likely break.  plus any site-relative links on that
>> snapshot would be pointing to your server, not the original, any ajax
>> code on that webpage would try to interact with your server which won't
>> be running the right back end stuff, etcetc.
> I usually am not good at explaining what I need. I really only need an
> image of what one would see in web browser if one point to that URL. I do
> not care it to be interactive. I also don't want to get the content
> ("mirror") of stuff that URL points to on variety of "depths" - I don't
> want to use wget or curl for this reason. That is what I tried first and
> it breaks with at lest one of the web sites - they do seem protect
> themselves from "robots" or similar. And we don't need it. We just need to
> show what they page shows today, that's all.
> Valeri

why not File -> Print -> .pdf?


>> --
>> john r pierce, recycling bits in santa cruz
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos

"As long as politics is the shadow cast on society by big business,
the attenuation of the shadow will not change the substance."

-- John Dewey