Dear Experts,
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
Thanks for all your pointers!
Valeri ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On Thu, 11 Aug 2016 15:46:42 -0500 (CDT) Valeri Galtsev wrote:
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
wget? httrack?
On 2016-08-11, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
Dear Experts,
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
Thanks for all your pointers!
Valeri
Not an answer to the question you asked, but maybe this is a job for a caching proxy server like squid?
On Thu, August 11, 2016 4:13 pm, Liam O'Toole wrote:
On 2016-08-11, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:
Dear Experts,
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
Thanks for all your pointers!
Valeri
Not an answer to the question you asked, but maybe this is a job for a caching proxy server like squid?
Thanks! It didn't occur to me. It will be much more sophisticated than just an image "snapshot" of the webpage, but should solve our problem. If I don't find anything doing "snapshot" successfully, this is what I will do.
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 8/11/2016 1:46 PM, Valeri Galtsev wrote:
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
many/most webpages these days are heavily dynamic content, a static snapshot would likely break. plus any site-relative links on that snapshot would be pointing to your server, not the original, any ajax code on that webpage would try to interact with your server which won't be running the right back end stuff, etcetc.
On Thu, August 11, 2016 5:02 pm, John R Pierce wrote:
On 8/11/2016 1:46 PM, Valeri Galtsev wrote:
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
many/most webpages these days are heavily dynamic content, a static snapshot would likely break. plus any site-relative links on that snapshot would be pointing to your server, not the original, any ajax code on that webpage would try to interact with your server which won't be running the right back end stuff, etcetc.
I usually am not good at explaining what I need. I really only need an image of what one would see in web browser if one point to that URL. I do not care it to be interactive. I also don't want to get the content ("mirror") of stuff that URL points to on variety of "depths" - I don't want to use wget or curl for this reason. That is what I tried first and it breaks with at lest one of the web sites - they do seem protect themselves from "robots" or similar. And we don't need it. We just need to show what they page shows today, that's all.
Valeri
-- john r pierce, recycling bits in santa cruz
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
Quoting Valeri Galtsev galtsev@kicp.uchicago.edu:
On Thu, August 11, 2016 5:02 pm, John R Pierce wrote:
On 8/11/2016 1:46 PM, Valeri Galtsev wrote:
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
We have a signage (xibo) and whoever creates/changes content, likes to add URLs of some webpages there. All works well if these are webpages on our servers (which are pretty fast), but some external servers often take time to respond and take time to assemble the page, in addition these servers sometimes get really busy, and when response is longer than time devoted for that content in signage window, this window hangs forever with blank white field until you restart client. Trivial workaround: just to get snapshot (as, say daily cron job), and point signage client to that snapshot definitely will solve it, and simultaneously we will stop bugging other people servers often without much need for it.
But when I tried to search for some utility or script that makes webpage snapshot, I discovered that my ability to search degraded somehow...
many/most webpages these days are heavily dynamic content, a static snapshot would likely break. plus any site-relative links on that snapshot would be pointing to your server, not the original, any ajax code on that webpage would try to interact with your server which won't be running the right back end stuff, etcetc.
I usually am not good at explaining what I need. I really only need an image of what one would see in web browser if one point to that URL. I do not care it to be interactive. I also don't want to get the content ("mirror") of stuff that URL points to on variety of "depths" - I don't want to use wget or curl for this reason. That is what I tried first and it breaks with at lest one of the web sites - they do seem protect themselves from "robots" or similar. And we don't need it. We just need to show what they page shows today, that's all.
Valeri
why not File -> Print -> .pdf?
D
-- john r pierce, recycling bits in santa cruz
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++ _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 8/11/2016 3:10 PM, Valeri Galtsev wrote:
I usually am not good at explaining what I need. I really only need an image of what one would see in web browser if one point to that URL. I do not care it to be interactive. I also don't want to get the content ("mirror") of stuff that URL points to on variety of "depths" - I don't want to use wget or curl for this reason. That is what I tried first and it breaks with at lest one of the web sites - they do seem protect themselves from "robots" or similar. And we don't need it. We just need to show what they page shows today, that's all.
then screen capture is about it.... too many sites, ALL the content is dynamic, for instance, https://www.google.com/maps/@36.9460899,-122.0268105,664a,20y,41.31t/data=!3...
that page is composed of tiles of image data superimposed on the fly with ajax code running in the browser to fetch the layers displayed.
you simply can't fetch the html and make any sense out of it, the browser is running a complex application to display that.
On Thu, August 11, 2016 5:27 pm, John R Pierce wrote:
On 8/11/2016 3:10 PM, Valeri Galtsev wrote:
I usually am not good at explaining what I need. I really only need an image of what one would see in web browser if one point to that URL. I do not care it to be interactive. I also don't want to get the content ("mirror") of stuff that URL points to on variety of "depths" - I don't want to use wget or curl for this reason. That is what I tried first and it breaks with at lest one of the web sites - they do seem protect themselves from "robots" or similar. And we don't need it. We just need to show what they page shows today, that's all.
then screen capture is about it.... too many sites, ALL the content is dynamic, for instance, https://www.google.com/maps/@36.9460899,-122.0268105,664a,20y,41.31t/data=!3...
that page is composed of tiles of image data superimposed on the fly with ajax code running in the browser to fetch the layers displayed.
you simply can't fetch the html and make any sense out of it, the browser is running a complex application to display that.
Yes, I understand as much, thanks. I'm still sure it is not hopeless task.
Valeri
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 12/08/16 06:46, Valeri Galtsev wrote:
Dear Experts,
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
Looks like this *[0]* is what you are after:
CmdShots is a FireFox add-on that takes full-page screenshots through the Command-line.
On 12/08/16 19:55, Anthony K wrote:
On 12/08/16 06:46, Valeri Galtsev wrote:
Dear Experts,
Could someone recommend a script or utility one can run from command line on Linux or UNIX machine to make a snapshot of webpage?
Looks like this *[0]* is what you are after:
CmdShots is a FireFox add-on that takes full-page screenshots through the Command-line.
For completeness sake, the author first sought a solution on Stack Overflow [1]. When one was not forthcoming, he created his own solution.
[1] http://stackoverflow.com/questions/13158083/take-a-full-page-screenshot-with...