Sometime ago I also tried to write a scrapper script in python to get all the erratas. My major crib that time was:
- No fixed format of the subject line which made it difficult to
scrap the package name
I think this is something that CentOS could improve in general about how their Erratas are available. Karanbir Singh recently mentioned on #pulp, that if there is something that could be done to help to integrate CentOS better into pulp, this should be brought up to the CentOS developers. (If I understood his comment correctly....)
So this is imho something that could be brought up on the centos-devel list, or on the more appropriate one, if that doesn't fit there.
Hence, I CC this to the centos-devel list. I hope this is fine for everybody.
@centos-devel: Thread: https://www.redhat.com/archives/pulp-list/2011-November/msg00110.html
- Red Hat would announce the errata and only after a few days centos
errata will come out. The time gap in the two made the server vulnerable.
This is the price you pay for CentOS. If you don't like that, you should buy RedHat.
~pete
On 11/29/2011 06:24 AM, Peter Meier wrote:
Sometime ago I also tried to write a scrapper script in python to get all the erratas. My major crib that time was:
- No fixed format of the subject line which made it difficult to
scrap the package name
I think this is something that CentOS could improve in general about how their Erratas are available. Karanbir Singh recently mentioned on #pulp, that if there is something that could be done to help to integrate CentOS better into pulp, this should be brought up to the CentOS developers. (If I understood his comment correctly....)
So this is imho something that could be brought up on the centos-devel list, or on the more appropriate one, if that doesn't fit there.
Hence, I CC this to the centos-devel list. I hope this is fine for everybody.
We can not "copy" what they have, this would be a term of service breach of their "Errata Portal"
We can point to things, we can't copy them ... hence the links and the announce list.
@centos-devel: Thread: https://www.redhat.com/archives/pulp-list/2011-November/msg00110.html
- Red Hat would announce the errata and only after a few days centos
errata will come out. The time gap in the two made the server vulnerable.
This is the price you pay for CentOS. If you don't like that, you should buy RedHat.
We can not "copy" what they have, this would be a term of service breach of their "Errata Portal"
We can point to things, we can't copy them ... hence the links and the announce list.
But the errata could also be made available in additional format than only by mail? That would not include to "copy" them, wouldn't it?
~pete
On Tuesday, November 29, 2011 04:16:32 PM Peter Meier wrote:
But the errata could also be made available in additional format than only by mail? That would not include to "copy" them, wouldn't it?
Perhaps I'm speaking out of turn, but this seems to be the sort of thing RSS was designed to do.
On 11/29/2011 09:29 PM, Lamar Owen wrote:
Perhaps I'm speaking out of turn, but this seems to be the sort of thing RSS was designed to do.
Rss works for a trigger, so we could feed in a rss document with the last 500 rpms that were updated. And have an api sort of call that people can make to retrieve metadata about the updates.
The reason why doing this inside rss is a problem is that there is no way to tell how much or what point-in-time the client machine is in, and feeding that info back upto the app, so it can do a diff and feed in the relevant metadata is going to be super-expensive, a massive text document and then we end up trying to re-implement a 'yum list updates' - only it would need to be across every point release.
- KB
Hi Pete,
On 11/29/2011 09:16 PM, Peter Meier wrote:
But the errata could also be made available in additional format than only by mail? That would not include to "copy" them, wouldn't it?
The issue is that we cant use / consume info published inside rhn, any of it. Doing so puts not only us in violation of their AUP and Terms, but everyone who uses CentOS. Its too big a deal to risk.
Having said that, I think the issue really boils down to : 1) What should be included in a release announcement
2) Can this info be available in a format that is consumable without need to scrape.
So we really just need to work out (1), doing (2) is now quite easy, the entire buildsystem uses a json / rest api that can be extended in either direction ( both into the system and outside the system to user/consumer stacks ). An example would be getting a callback when a package one subscribes to gets an update, along with a json object that contains metadata around the update.
I believe pulp is getting functionality that allows arbitrary content consumption, so a plugin that could map centos-errata-metadata into a format pulp can consume would be the way to go. Ideally the spacekwalk people will stop scraping the mailing lists as well.
Btw, I'm not on the pulp list, so lets trust you to bridge.
- KB