[CentOS] perl code to remove newlines

Fri Dec 31 03:52:30 UTC 2010
Bart Schaefer <barton.schaefer at gmail.com>

On Thu, Dec 30, 2010 at 5:19 AM, ken <gebser at mousecar.com> wrote:
>
>
> --------- begin snippet ---------
> while (<$in>)
> {
>    s/<(\w*\W)/<\L$1/g;         # Downcase XXX in "<XXX".
>    s/<\/(\w*\W)/<\/\L$1/g;     # Downcase XXX in "</XXX".

chomp;  # Always remove the newline
unless (/<html/) {
   # Not on first line, so


>    if(/^>/)                    # if this line starts with '>'
>    {                           # then
>        $curr = tell $in;       # Note current file position,
>        seek $in, $prev, 0;     # go back to previous line,
>        chomp;                  # remove its trailing newline char,
>        seek $in, $curr, 0;     # and reset position to current line.
>    }
>    else
>    {
>        $curr = tell $in;       # Note current file position,
>        seek $in, $prev, 0;     # go back to previous line
>        s/\n/ /;                # Append a space,
>        chop;                   # and then chomp.
>        seek $in, $curr, 0;     # and reset position to current line.
>    }
>    print;
>    print $out;
>    $prev = tell $in;           # Location of previous line.
> }
> --------- end snippet ---------
>
> When I cat the output file, it looks like this:
>
> --------- begin snippet ---------
> GLOB(0x9fd587c)<htmlGLOB(0x9fd587c)><headGLOB(0x9fd587c)><titleGLOB(0x9fd587c)>We've
> Lied to
> You…</titleGLOB(0x9fd587c)><metaGLOB(0x9fd587c)NAME="GENERATOR"GLOB(0x9fd587c)CONTENT="Modular
> DocBook HTML Stylesheet Version
> 1.79"><linkGLOB(0x9fd587c)REL="HOME"GLOB(0x9fd587c)TITLE="Maximum
> RPM"GLOB(0x9fd587c)HREF="index.html"><linkGLOB(0x9fd587c)REL="UP"GLOB(0x9fd587c)TITLE="Using
> RPM to Verify Installed
> Packages"GLOB(0x9fd587c)HREF="ch-rpm-verify.html"><linkGLOB(0x9fd587c)....
> --------- end snippet ---------
>
> The output I should say *is* all on one line, not line-wrapped the way
> you see it above.  I have a hunch as to why there are the
> "GLOB(0x9fd587c)" thingies everywhere the newlines or spaces (' ')
> should be.  If some expert here could explain them, that would be really
> good.  More importantly though would be some instruction as to how to
> remove the newlines without creating all the GLOB(...) garbage.  Might I
> have to rewrite the script so to open the file in binary mode... or what?
>
>
> Maximum thanks for your assistance.
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos