Comments on: Fetch and Parse HTML Web Page Content From Bash. Wow. https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/ Matt Wynne taking it one tea at a time Wed, 21 Aug 2019 13:05:19 +0000 hourly 1 https://wordpress.org/?v=6.2 By: thomas https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1905 Tue, 15 Jan 2013 19:02:49 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1905 you can retrieve an html page (with the html stripped off) with lynx as well by using the -dump option. syntax: lynx -dump http://u.rl

just for that person’s information…

]]>
By: thomas https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1904 Tue, 15 Jan 2013 18:46:31 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1904 timeless! i wonder if anyone tried this with lynx. @ pete: thnx for the dictionaries and what not!

]]>
By: jaybee https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1901 Sat, 29 Dec 2012 08:02:09 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1901 this doesn’t work for me. I am trying to pull descriptions of episodes of a TV show but when I use your syntax I am presented with:

>

as if bash is waiting for a terminating symbol

]]>
By: Greg https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1780 Mon, 29 Oct 2012 00:16:57 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1780 Awesome!

]]>
By: dbm https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1751 Sun, 16 Sep 2012 04:49:58 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1751 I am that person. Thank you so much for this. I didnt want to ask dev of app I use to add this feature. Now I can just change /my/ copy.

]]>
By: Mike https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1561 Mon, 13 Feb 2012 15:07:38 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1561 A four year old blog post that STILL sticks out. If only more people had the same approach to the web, the life and everything – “what a wonderful place this would be” !

Thanks.

cURL is a neat tool btw 😉

//MK

]]>
By: Wim https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-1235 Thu, 17 Feb 2011 18:59:20 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-1235 Great stuff, it was exactly what I needed for a script I am working on wright now!

see: http://forum.nedlinux.nl/viewtopic.php?pid=354120#p354120

]]>
By: Eoin McCarthy https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-834 Mon, 19 Oct 2009 01:19:03 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-834 Great stuff! I have to admit I wasn’t aware of curl before. However I couldn’t track down w3m for the mac and ended up downloading lynx instead.

lynx –dump gives you very similar results.

]]>
By: Pete https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-820 Mon, 05 Oct 2009 06:39:34 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-820 Geez… my formatting is horrible…

On the *nix boxes… I got the scripts combined… Windows…. well… uhhh…. batch files have lost their appeal many moon’s ago.

]]>
By: Pete https://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/comment-page-1/#comment-819 Mon, 05 Oct 2009 06:36:56 +0000 http://blog.mattwynne.net/2008/04/26/fetch-and-parse-html-web-page-content-from-bash-wow/#comment-819 Well there are a few things you can do with with curl, w3m, html2text

Currently I have a few things in my bag o’ scripts:

1) Command line wikipedia script
2) Zipcode look up script
3) NPA IE: Area code lookup script (telco thing)
4) NPA-NXX Area code and prefix lookup script. (telco thing)
5) NPA-NXX-XXXX phone number lookup script. (telco thing).
6) Dictionary lookup script using curl. Since curl understands the dict protocol.
5) acronyms lookup script (curl again)
6) soundslike script that looks words up in the soundex database (helps for misspelled words that your word processor or text editor doesn’t catch.

Ahhhh what the heck; here is the windows version of curl with the dictionary script:

curl -s dict://dict.org/d:%1:gcide &
curl. -s dict://dict.org/d:%1:wn &
curl. -s dict://dict.org/d:%1:web1913 &

Replace the %1 on windows to a $1 in OSX, Linux, BSD, Solaris and your golden.

Here is the scripts for the soundex database (IE: word sounds like):

curl.-s dict://dict.org/m:%1::soundex
curl. -s dict://dict.org/m:%1::soundex:1

Script for a thesaurus

curl. -s dict://dict.org/d:%1:moby-thes

check out dict.org for the protocol and a list of database that they have accessible to the public.

Here is the script for the wiktionary look up script. Check out the database to see what other data formats can be outputted. Text, html, etc. Wikitionary is dictionary side of the house of Wikipedia. They are making a free dictionary.

curl. -s dict://dict.hewgill.com/d:%1:en-brief &

curl –manual | grep dict

Don’t forget to read the RFC on the dict protocol.

I would post the others but those scripts are quite long and their on my nix boxes.

PS: You can use the dump option in w3m and look into the column option . On a related note; check out “html2text”
Options -ascii -style pretty

html2text is part curl and part w3m. More options exist when using curl and html2text, than html2text alone.

PSS: Good luck on your job search.
Regards.

]]>