On Thu, Dec 30, 2010 at 08:19:00AM -0500, ken wrote: It isn't perl, but does 'tr' exist in CentOS (it does in FreeBSD)? It would do it. ////jerry > > Given an HTML file which looks like this: > > --------- begin snippet --------- > <HTML > ><HEAD > ><TITLE > >We've Lied to You…</TITLE > ><META > NAME="GENERATOR" > CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK > REL="HOME" > TITLE="Maximum RPM" > HREF="index.html"><LINK > REL="UP" > TITLE="Using RPM to Verify Installed Packages" > HREF="ch-rpm-verify.html"><LINK > ... > --------- end snippet --------- > > I'm coding some perl to make it look something like this: > > --------- begin snippet --------- > <html> > <head> > <title>We've Lied to You…</title> > > <meta name="generator" content="Modular DocBook HTML Stylesheet Version > 1.79"> > > <link rel="HOME" title="Maximum RPM" href="index.html"> > > <line rel="UP" title="Using RPM to Verify Installed Packages" > href="ch-rpm-verify.html"> > > <link .... > --------- end snippet --------- > > I've hit a wall trying to remove all the newlines. I've tried it > several ways... here's just one: > > --------- begin snippet --------- > while (<$in>) > { > s/<(\w*\W)/<\L$1/g; # Downcase XXX in "<XXX". > s/<\/(\w*\W)/<\/\L$1/g; # Downcase XXX in "</XXX". > if(/^>/) # if this line starts with '>' > { # then > $curr = tell $in; # Note current file position, > seek $in, $prev, 0; # go back to previous line, > chomp; # remove its trailing newline char, > seek $in, $curr, 0; # and reset position to current line. > } > else > { > $curr = tell $in; # Note current file position, > seek $in, $prev, 0; # go back to previous line > s/\n/ /; # Append a space, > chop; # and then chomp. > seek $in, $curr, 0; # and reset position to current line. > } > print; > print $out; > $prev = tell $in; # Location of previous line. > } > --------- end snippet --------- > > When I cat the output file, it looks like this: > > --------- begin snippet --------- > GLOB(0x9fd587c)<htmlGLOB(0x9fd587c)><headGLOB(0x9fd587c)><titleGLOB(0x9fd587c)>We've > Lied to > You…</titleGLOB(0x9fd587c)><metaGLOB(0x9fd587c)NAME="GENERATOR"GLOB(0x9fd587c)CONTENT="Modular > DocBook HTML Stylesheet Version > 1.79"><linkGLOB(0x9fd587c)REL="HOME"GLOB(0x9fd587c)TITLE="Maximum > RPM"GLOB(0x9fd587c)HREF="index.html"><linkGLOB(0x9fd587c)REL="UP"GLOB(0x9fd587c)TITLE="Using > RPM to Verify Installed > Packages"GLOB(0x9fd587c)HREF="ch-rpm-verify.html"><linkGLOB(0x9fd587c).... > --------- end snippet --------- > > The output I should say *is* all on one line, not line-wrapped the way > you see it above. I have a hunch as to why there are the > "GLOB(0x9fd587c)" thingies everywhere the newlines or spaces (' ') > should be. If some expert here could explain them, that would be really > good. More importantly though would be some instruction as to how to > remove the newlines without creating all the GLOB(...) garbage. Might I > have to rewrite the script so to open the file in binary mode... or what? > > > Maximum thanks for your assistance. > > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos