[CentOS] OT: grep regex pointer appreciated

Sat Mar 5 23:18:10 UTC 2011
Nico Kadel-Garcia <nkadel at gmail.com>

On Sat, Mar 5, 2011 at 5:13 PM, Patrick Lists
<centos-list at puzzled.xs4all.nl> wrote:
> Hi,
>
> My grep regex foo is not very good and googling is getting me nowhere so
> hopefully someone is kind enough to give me some pointers.
>
> Goal: grep (non .dbg) filenames and versions from a ftp dir listing and
> a raw html file:
>
> $ wget --no-remove-listing -O ftp-index.txt ftp://127.0.0.1/test/
> $ wget --no-remove-listing -O index.html http://127.0.0.1/test/
>
> The relevant parts of the files above (first one is ftp listing, second
> part is the html file, both copied to test_regex.txt) are:
>
> 2011 Jan 28 21:25  File  <a
> href="ftp://127.0.0.1/bar-4.5.6.i686.dbg.tgz">bar-4.5.6.i686.dbg.tgz</a>
>  (5551274 bytes)
> 2011 Jan 28 21:25  File  <a
> href="ftp://127.0.0.1/bar-4.5.6.i686.tgz">bar-4.5.6.i686.tgz</a>
> (5551274 bytes)
> 2011 Jan 28 21:25  File  <a
> href="ftp://127.0.0.1/bar-4.5.6.x86_64.dbg.tgz">bar-4.5.6.x86_64.dbg.tgz</a>
>  (5551274 bytes)
> 2011 Jan 28 21:25  File  <a
> href="ftp://127.0.0.1/bar-4.5.6.x86_64.tgz">bar-4.5.6.x86_64.tgz</a>
> (5551274 bytes)
>
> <tr><td><a
> href="foo-bar-1.2.3+1.2.3.tar.gz">foo-bar-1.2.3+1.2.3.tar.gz</td></tr>
>
> This is what I now have (improvements most welcome):
>
> $ egrep -o
> ">([A-Za-z_-]+)([[:digit:]]{1,3}(\.[[:digit:]]{1,3})*).+(.|t)gz"
> ./test_regex.txt | grep -v ".dbg" | tr -d '>'
>
> Output:
>
> foo-bar-1.2.3+1.2.3.tar.gz
> baz-4.5.6.i686.tgz
> baz-4.5.6.x86_64.tgz
>
> So far so good but now I also want to get the version numbers which I
> can't figure out. Anyone have a pointer how to get the version number
> from these filenames (1.2.3+1.2.3 and 4.5.6)?

Separate the ".i686.tgz" with something like a '-' or "_", not a dot.
and be consistent about using .tar.gz instead of mixing .tar.gz and
.tgz, if possible.