Comments on: Fetch and Parse HTML Web Page Content From Bash. Wow.

By: thomas

thomas — Tue, 15 Jan 2013 19:02:49 +0000

you can retrieve an html page (with the html stripped off) with lynx as well by using the -dump option. syntax: lynx -dump http://u.rl

just for that person’s information…

By: thomas

thomas — Tue, 15 Jan 2013 18:46:31 +0000

timeless! i wonder if anyone tried this with lynx. @ pete: thnx for the dictionaries and what not!

By: jaybee

jaybee — Sat, 29 Dec 2012 08:02:09 +0000

this doesn't work for me. I am trying to pull descriptions of episodes of a TV show but when I use your syntax I am presented with:

as if bash is waiting for a terminating symbol

By: Greg

Greg — Mon, 29 Oct 2012 00:16:57 +0000

Awesome!

By: dbm

dbm — Sun, 16 Sep 2012 04:49:58 +0000

I am that person. Thank you so much for this. I didnt want to ask dev of app I use to add this feature. Now I can just change /my/ copy.

By: Mike

Mike — Mon, 13 Feb 2012 15:07:38 +0000

A four year old blog post that STILL sticks out. If only more people had the same approach to the web, the life and everything - "what a wonderful place this would be" !

Thanks.

cURL is a neat tool btw ;)

//MK

By: Wim

Wim — Thu, 17 Feb 2011 18:59:20 +0000

Great stuff, it was exactly what I needed for a script I am working on wright now!

see: http://forum.nedlinux.nl/viewtopic.php?pid=354120#p354120

By: Eoin McCarthy

Eoin McCarthy — Mon, 19 Oct 2009 01:19:03 +0000

Great stuff! I have to admit I wasn't aware of curl before. However I couldn't track down w3m for the mac and ended up downloading lynx instead.

lynx --dump gives you very similar results.

By: Pete

Pete — Mon, 05 Oct 2009 06:39:34 +0000

Geez… my formatting is horrible…

On the *nix boxes… I got the scripts combined… Windows…. well… uhhh…. batch files have lost their appeal many moon’s ago.

By: Pete

Pete — Mon, 05 Oct 2009 06:36:56 +0000

Well there are a few things you can do with with curl, w3m, html2text

Currently I have a few things in my bag o' scripts:

1) Command line wikipedia script 2) Zipcode look up script 3) NPA IE: Area code lookup script (telco thing) 4) NPA-NXX Area code and prefix lookup script. (telco thing) 5) NPA-NXX-XXXX phone number lookup script. (telco thing). 6) Dictionary lookup script using curl. Since curl understands the dict protocol. 5) acronyms lookup script (curl again) 6) soundslike script that looks words up in the soundex database (helps for misspelled words that your word processor or text editor doesn't catch.

Ahhhh what the heck; here is the windows version of curl with the dictionary script:

curl -s dict://dict.org/d:%1:gcide & curl. -s dict://dict.org/d:%1:wn & curl. -s dict://dict.org/d:%1:web1913 &

Replace the %1 on windows to a $1 in OSX, Linux, BSD, Solaris and your golden.

Here is the scripts for the soundex database (IE: word sounds like):

curl.-s dict://dict.org/m:%1::soundex curl. -s dict://dict.org/m:%1::soundex:1

Script for a thesaurus

curl. -s dict://dict.org/d:%1:moby-thes

check out dict.org for the protocol and a list of database that they have accessible to the public.

Here is the script for the wiktionary look up script. Check out the database to see what other data formats can be outputted. Text, html, etc. Wikitionary is dictionary side of the house of Wikipedia. They are making a free dictionary.

curl. -s dict://dict.hewgill.com/d:%1:en-brief &

curl --manual | grep dict

Don't forget to read the RFC on the dict protocol.

I would post the others but those scripts are quite long and their on my nix boxes.

PS: You can use the dump option in w3m and look into the column option . On a related note; check out "html2text" Options -ascii -style pretty

html2text is part curl and part w3m. More options exist when using curl and html2text, than html2text alone.

PSS: Good luck on your job search. Regards.