On Sat, Mar 5, 2011 at 5:13 PM, Patrick Lists <centos-list at puzzled.xs4all.nl> wrote: > Hi, > > My grep regex foo is not very good and googling is getting me nowhere so > hopefully someone is kind enough to give me some pointers. > > Goal: grep (non .dbg) filenames and versions from a ftp dir listing and > a raw html file: > > $ wget --no-remove-listing -O ftp-index.txt ftp://127.0.0.1/test/ > $ wget --no-remove-listing -O index.html http://127.0.0.1/test/ > > The relevant parts of the files above (first one is ftp listing, second > part is the html file, both copied to test_regex.txt) are: > > 2011 Jan 28 21:25 File <a > href="ftp://127.0.0.1/bar-4.5.6.i686.dbg.tgz">bar-4.5.6.i686.dbg.tgz</a> > (5551274 bytes) > 2011 Jan 28 21:25 File <a > href="ftp://127.0.0.1/bar-4.5.6.i686.tgz">bar-4.5.6.i686.tgz</a> > (5551274 bytes) > 2011 Jan 28 21:25 File <a > href="ftp://127.0.0.1/bar-4.5.6.x86_64.dbg.tgz">bar-4.5.6.x86_64.dbg.tgz</a> > (5551274 bytes) > 2011 Jan 28 21:25 File <a > href="ftp://127.0.0.1/bar-4.5.6.x86_64.tgz">bar-4.5.6.x86_64.tgz</a> > (5551274 bytes) > > <tr><td><a > href="foo-bar-1.2.3+1.2.3.tar.gz">foo-bar-1.2.3+1.2.3.tar.gz</td></tr> > > This is what I now have (improvements most welcome): > > $ egrep -o > ">([A-Za-z_-]+)([[:digit:]]{1,3}(\.[[:digit:]]{1,3})*).+(.|t)gz" > ./test_regex.txt | grep -v ".dbg" | tr -d '>' > > Output: > > foo-bar-1.2.3+1.2.3.tar.gz > baz-4.5.6.i686.tgz > baz-4.5.6.x86_64.tgz > > So far so good but now I also want to get the version numbers which I > can't figure out. Anyone have a pointer how to get the version number > from these filenames (1.2.3+1.2.3 and 4.5.6)? Separate the ".i686.tgz" with something like a '-' or "_", not a dot. and be consistent about using .tar.gz instead of mixing .tar.gz and .tgz, if possible.