[CentOS] script to make webpage snapshot

Thu Aug 11 22:10:33 UTC 2016
Valeri Galtsev <galtsev at kicp.uchicago.edu>

On Thu, August 11, 2016 5:02 pm, John R Pierce wrote:
> On 8/11/2016 1:46 PM, Valeri Galtsev wrote:
>> Could someone recommend a script or utility one can run from command
>> line
>> on Linux or UNIX machine to make a snapshot of webpage?
>> We have a signage (xibo) and whoever creates/changes content, likes to
>> add
>> URLs of some webpages there. All works well if these are webpages on our
>> servers (which are pretty fast), but some external servers often take
>> time
>> to respond and take time to assemble the page, in addition these servers
>> sometimes get really busy, and when response is longer than time devoted
>> for that content in signage window, this window hangs forever with blank
>> white field until you restart client. Trivial workaround: just to get
>> snapshot (as, say daily cron job), and point signage client to that
>> snapshot definitely will solve it, and simultaneously we will stop
>> bugging
>> other people servers often without much need for it.
>> But when I tried to search for some utility or script that makes webpage
>> snapshot, I discovered that my ability to search degraded somehow...
> many/most webpages these days are heavily dynamic content, a static
> snapshot would likely break.  plus any site-relative links on that
> snapshot would be pointing to your server, not the original, any ajax
> code on that webpage would try to interact with your server which won't
> be running the right back end stuff, etcetc.

I usually am not good at explaining what I need. I really only need an
image of what one would see in web browser if one point to that URL. I do
not care it to be interactive. I also don't want to get the content
("mirror") of stuff that URL points to on variety of "depths" - I don't
want to use wget or curl for this reason. That is what I tried first and
it breaks with at lest one of the web sites - they do seem protect
themselves from "robots" or similar. And we don't need it. We just need to
show what they page shows today, that's all.


> --
> john r pierce, recycling bits in santa cruz

Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247