I've written a small utility (brand-hunter.py) to help with tracking down branding issues. It does the following:
* accepts a list of srpms to search (if no srpms are listed, all srpms are searched) * downloads the srpms from ftp.redhat.com (across all rhel7 srpm repos) * extracts srpm content, including any bz2 or gz tarfiles. * searches text files (in multiline mode) for the pattern '[Rh][Ee][Dd]\s?[Hh][Aa][Tt]' * searches for any binary files * writes a list of issues (by file and line) to an issues.txt file (per srpm) * writes a noissues.txt file listing any srpms for which no issues were found.
If folks are interested, please let me know a location where I can make the utility available.
I ran it across the first 100 srpms (by yum sort order) and found only 7 srpms with no issues:
GreSQL-4.0-9.el7.src SOAPpy-0.11.6-17.el7.src akonadi-1.9.2-4.el7.src ant-antunit-1.2-10.el7.src aopalliance-1.0-8.el7.src apache-commons-exec-1.1-11.el7.src apache-parent-10-14.el7.src
The other 93 files had a range of issues, the most common of which are -
* Redhat.com email addresses in patch files or author lists * Red Hat copyright statements * Binary files (of any kind, right now the utility flags them all as potential issues)
The utility can be made smarter, say to ignore srpms with only redhat email addresses or copyright statements. But exclusions like these would be want to be done on a case by case bases, I imagine.
Kay
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 06/04/2014 10:36 AM, Kay Williams wrote:
If folks are interested, please let me know a location where I can make the utility available.
I'm quite interested, thanks! Would you consider putting it up as a Git repository on one of the Git services, such as gitorious.org or github.com?
- - Karsten - -- Karsten 'quaid' Wade .^\ CentOS Doer of Stuff http://TheOpenSourceWay.org \ http://community.redhat.com @quaid (identi.ca/twitter/IRC) \v' gpg: AD0E0C41
On Wednesday, June 04, 2014 3:11 PM, Karsten Wade wrote:
I'm quite interested, thanks! Would you consider putting it up as a Git repository on one of the Git services, such as gitorious.org or github.com?
OK, it now can be cloned from https://github.com/kaywilliams/brand-hunter.git
On Wed, 2014-06-04 at 16:57 -0700, Kay Williams wrote:
On Wednesday, June 04, 2014 3:11 PM, Karsten Wade wrote:
I'm quite interested, thanks! Would you consider putting it up as a Git repository on one of the Git services, such as gitorious.org or github.com?
OK, it now can be cloned from https://github.com/kaywilliams/brand-hunter.git
A useful tool indeed. By a quick look at the script it searches all files within an SRPM. Should this tool now not be tweaked to ignore/exclude files with historical data that should never be removed that would certainly flag up in a search i.e. '.spec. files? Feature request. :-)
Regards
Phil
On Wednesday, June 04, 2014 5:21 PM, Phil Wyett write:
A useful tool indeed. By a quick look at the script it searches all files within an SRPM. Should this tool now not be tweaked to ignore/exclude files with historical data that should never be removed that would certainly flag up in a search i.e. '.spec. files? Feature request. :-)
Actually, it does an 'rpm -i' on the srpm and then searches the SOURCES folder (this is all done in a separate rpmbuild/_topdir folder per SRPM). Because the .spec file is saved in the SPECS folder, the tool doesn't search it.
But, there may be other reliable heuristics for files that should be ignored. Perhaps any patch files (i.e. files with the mime type 'text/x-diff') at the root of the SOURCES folder? Other candidates?
Kay
On Wednesday, June 04, 2014 5:21 PM, Phil Wyett write:
A useful tool indeed. By a quick look at the script it searches all files within an SRPM. Should this tool now not be tweaked to ignore/exclude files with historical data that should never be removed that would certainly flag up in a search i.e. '.spec. files? Feature request. :-)
Actually, it does an 'rpm -i' on the srpm and then searches the SOURCES folder (this is all done in a separate rpmbuild/_topdir folder per SRPM). Because the .spec file is saved in the SPECS folder, the tool doesn't search it.
But I think there are cases where also the .spec file would need to be changed?
Simon
On 06/05/2014 07:03 AM, Simon Matter wrote:
On Wednesday, June 04, 2014 5:21 PM, Phil Wyett write:
A useful tool indeed. By a quick look at the script it searches all files within an SRPM. Should this tool now not be tweaked to ignore/exclude files with historical data that should never be removed that would certainly flag up in a search i.e. '.spec. files? Feature request. :-)
Actually, it does an 'rpm -i' on the srpm and then searches the SOURCES folder (this is all done in a separate rpmbuild/_topdir folder per SRPM). Because the .spec file is saved in the SPECS folder, the tool doesn't search it.
But I think there are cases where also the .spec file would need to be changed?
traditionally, .spec files sometimes mentioned things like 'BlahBlah tool for Red Hat Enterprise Linux' , and we need to change those to say 'for CentOS Linux'
typically, found in the %description section, but could be in other places.
Also, stuff like httpd has Vendor setup in the .spec file, we need to change that to CentOS as well ( but I am struggling to think of a second example off the top of my head, so might just be a case of do it manually once, and forget about it ).
- KB
On Thu, Jun 05, 2014 at 12:42:38PM +0100, Karanbir Singh wrote:
traditionally, .spec files sometimes mentioned things like 'BlahBlah tool for Red Hat Enterprise Linux' , and we need to change those to say 'for CentOS Linux' typically, found in the %description section, but could be in other places.
If you find these, and they're also that way in Fedora, could you please file bugs for them?
Also, stuff like httpd has Vendor setup in the .spec file, we need to change that to CentOS as well ( but I am struggling to think of a second example off the top of my head, so might just be a case of do it manually once, and forget about it ).
Packages in Fedora definitely should not have Vendor set to anything, so if you find any, yeah, bugs. :)
On 06/05/2014 01:16 PM, Matthew Miller wrote:
On Thu, Jun 05, 2014 at 12:42:38PM +0100, Karanbir Singh wrote:
traditionally, .spec files sometimes mentioned things like 'BlahBlah tool for Red Hat Enterprise Linux' , and we need to change those to say 'for CentOS Linux' typically, found in the %description section, but could be in other places.
If you find these, and they're also that way in Fedora, could you please file bugs for them?
sure
Also, stuff like httpd has Vendor setup in the .spec file, we need to change that to CentOS as well ( but I am struggling to think of a second example off the top of my head, so might just be a case of do it manually once, and forget about it ).
Packages in Fedora definitely should not have Vendor set to anything, so if you find any, yeah, bugs. :)
this is actually the Vendor string that goes into httpd's Version string. We like to have it say httpd 2.4.xxx (CentOS) instead of (Red Hat)
On Thu, Jun 05, 2014 at 01:25:45PM +0100, Karanbir Singh wrote:
Packages in Fedora definitely should not have Vendor set to anything, so if you find any, yeah, bugs. :)
this is actually the Vendor string that goes into httpd's Version string. We like to have it say httpd 2.4.xxx (CentOS) instead of (Red Hat)
Oh, yeah. Sorry :)
On 06/04/2014 11:10 PM, Karsten Wade wrote:
On 06/04/2014 10:36 AM, Kay Williams wrote:
If folks are interested, please let me know a location where I can make the utility available.
I'm quite interested, thanks! Would you consider putting it up as a Git repository on one of the Git services, such as gitorious.org or github.com?
depending on how this is licensed, I am happy to host this right inside the common repos for git.centos.org
Kay, this is absolutely brilliant!
- KB
On Thursday, June 05, 2014 4:41 AM, Karanbir Singh wrote:
depending on how this is licensed, I am happy to host this right inside the common repos for git.centos.org
GPLv2. Feel free to move it and I'll take down the other repo.
Also, just posted a commit with a few changes:
* Search files in the SPECS folder (i.e. the specfile) in addition to SOURCES * Add a command line option '--ignore-email' to filter text that matches the pattern 'email@redhat.com'. Text in this pattern occurs commonly in changelogs (e.g. in the specfile) and is not usually a branding issue. * Fix a bug on el7 that was reporting directories as binary files.
Kay