Lua HTTP/HTTPS client with XML and XML/RDF parser
Copyright © 2010 StrixDB. Freely available under the terms of the StrixDB license.
Overview
Status
Download and Installation
History
Manual
Reference
Methods
Parser functions
Helper functions
webGet is a regular lua 5.1 module that could be used to download file
with HTTP/HTTPS and FTP requests.
webGet can use cookies and a
download cache .
webGet provides also a simple SAX XML
parser to process files during download. XML/RDF files are parsed and retrieved
as triples.
webGet could submit data to web server with POST request.
Current version is 1.21. It was developed for Lua 5.1.
Current version is
only available for Windows.
The current version is a beta version and could be
used free of charges and available under the terms of the
StrixDB license for
beta versions (Perl Artistic License).
webGet uses the
Expat library and the
WinInet API of Windows.
C:\StrixDB\release>lua Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > require('webGet') >
The above commented Lua code is self-explaining (and show the zen simplicity of the API) :
cli = webGet.new{} local address = 'http://opoirel.free.fr/strixDB/samples/animals.rdf' local rc,error=cli:GET(address, print) -- print the file assert(rc==200) rc,error=cli:GETXML(address, print) -- print the first tag, his attribute(a table) and XML depth function printAsNT(subject,predicate,object,datatype,lang) -- build printable subject if subject:byte(1)~=95 then subject='<'..subject..'>' end -- build printable predicate predicate = '<'..predicate..'>' -- build printable object if datatype==nil then object = '<'..object..'>' else object = '"'..object..'"' if lang~=nil then object = object..'@'..lang end if datatype~='http://www.w3.org/2001/XMLschema#string' then object = object..'^^'..datatype end end print(subject,predicate,obj,'.') end rc,error=cli:GETtriples(address, printAsNT) cli:close()
Using https protocol is very simple. Certificates are shared with your
current Microsoft Internet Explorer Browser.
If some webGet connection with https fails due to an invalid certificate
(show the error message), just launch Internet Explore with the same URL and
follow the wizard to accept (or reject) the certificate. An example of https
connection:
rc,err = cli:GET( { url='https://store.in.one.click/basket/', referer='http://store.in.one.click/android/killed_iPod/' }, print )
NOTE: url in connection method could be a string or a table describing url, referer and allowed types.
This function returns a WinInet Internet Session. By default, the
session use download cache and the Internet Explorer proxy settings. The user
agent is by default “Mozilla/4.0”.
Options are a table with the
following optional entries :
local cookies = webGet.getCookie('http://www.google.com') table.foreach(cookies, print)
The methods are available for an internet session :
cli = webGet.new {}
Closes the session, freeing all memory.
rc,str = cli:GET(<url>)
rc,str = cli:GET(<url>,function
f(txt)end)
The first method parameter <url> could be a Lua string (example: 'http://opoirel.free.fr/strixDB' ) or a table with the following entries:
For all the methods GET, POST, GETXML, POSTXML, GETtriples, POSTtriples, the first method parameter could be such a <string> or a Lua Table.
This method returns the http status code (examples: rc=200 if everything OK, rc=404 for file not found, etc…) and a string containing the file content OR an error message if rc is not 200.
The second form of this method call the function for each block of data
retrieved from internet.
cli = webGet.new {} local rc,str = cli:GET('http://opoirel.free.fr/strixDB/2010/parentShip.rlog') assert(rc==200) -- now str contains the targeted file local out = io.open(‘samesRules.rlog’,”w”) local rc,str = cli:GET(‘http://opoirel.free.fr/strixDB/2010/parentShip.rlog’, function(buffer) out:write(buffer) end ) assert(rc==200) out:close()
rc,str = cli:GETXML(<url> [,<startFunc> [,<dataFunc> [,<endFunc>]]] )
The parameters functions are the classical SAX functions.
The following example analyze a remote XML file :
cli = webGet.new {} function startF(tag,attrs,depth) local x = tag for i,v in pairs(attrs) do x = x..' '..i..'='..v end
print(x)
end
function dataF(txt) print(‘txt=’,txt) end
function endF(txt) print(‘end tag=’,txt) end
local rc,str = cli:GETXML(‘http://someFile/’,startF,dataF,endF)
assert(rc==200)
c,errmsg = cli:GETtriples(<url>, tripleCallback)
This function calls the lua fonction tripleCallback for each RDF triple found in the file at <url> address. The signature of the callback function is :
function (subject, predicate, object, datatype, lang)
where subject contains resource or blank node ID, predicate his a resource and object could be :
The above example shows such a call back function processing triples in N-Triple format.
rc,str = cli:POST(<url>, <dataToSend> [,function f(txt)end] )
Function working as cli:GET . The provided data (a Lua string) are send in form encoding to the <url>.
cli:POSTXML(<url>,datas [,<startFunc> [,<dataFunc> [,<endFunc>]]] )
Function working as cli:GETXML. The provided data (a Lua string) are send in form encoding to the <url>.
c,errmsg = cli:POSTtriples(<url>, <dataToSend> , tripleCallback)
Function working as POST but retrieving the RDF triples as with GETtriples.
cli:expandNS( true/false )
Flag forcing the XML parser to expand or not expand namespaces.
cli:base( <base> )
Method to set the base URI of the XML parser.
cli:parse(<xmlstring> [,<startFunc> [,<dataFunc> [,<endFunc>]]] )
Parse the <xmlstring>.
cli:flag()
return a string describing the current WinInet session flags.
cli:flag(<flagName>, true/false)
Set the corresponding flag for the current WinInet session. See WinInet documentation for more information. <flagName> must be one of :
cli:user(<userName>,<password>)
Set the user name and password for the session.
webGet.parse(<xmlstring> [,<startFunc> [,<dataFunc> [,<endFunc>]]] )
Parse the <xmlstring> without expanding namespaces. See method cli:GETXML for the other parameters.webGet.parseNS(<xmlstring> [,<startFunc> [,<dataFunc> [,<endFunc>]]] )
Parse the <xmlstring> expanding namespaces. See method cli:GETXML for the other parameters.
webGet.time()
webGet.time( <integer> )
This helper function returns a date in a string in the standard Web time format (W3C DTF).
Without argument, the current dateTime is returned. The second form translate the integer (supposed to be a C time_t integer).
> = webGet.time() 2010-07-2010-07-14T10:22:16Z >
webGet.encode(<string>)
This function returns the URL encoding (a lua string) of the argument string.
webGet.checkUTF8(<path>)
This function scans a directory (<path>) or a file (<path>) and analyzes if
the files are correctly encoded in UTF8 (with or without UTF8 BOM).
OK means
a file with no unicode characters or UTF8 unicode characters.
BOM means that
the specific UTF-8 BOM sequence was found.
This function could be helpfull to check directories of Lua scripts.
>webGet.checkUTF8('pub/') BOM pub/index.html BOM pub/res\datalog.css pub/res\energy.gif OK pub/res\foafGraph.rdf OK pub/res\graphDefinitions.xml pub/res\hierar.gif pub/res\ortho.gif pub/res\persons\bart.gif pub/res\persons\brockman.gif ...... pub/res\persons\troy_mcclure.gif OK pub/res\test.html OK pub/scripts\script1.lua BOM pub/scripts\test1.html BOM pub/sparql.html OK pub/testform.html >
webGet.dirent( <path> [,<options> [,<match>] ] )
This function scans a directory (<path>) and returns a table. The options are characters inside a string (as options of fopen function in C). Available options are :
If a match string is provided (3st argument), only the files with a name containing the match string are put in result table.
Examples of use:
> table.foreach( webGet.dirent("E:/RDF/RDF_TEST_CASES",'rd'), print ) 1 E:/RDF/RDF_TEST_CASES/ALL/amp-in-url 2 E:/RDF/RDF_TEST_CASES/ALL/datatypes 3 E:/RDF/RDF_TEST_CASES/ALL/datatypes-intensional 4 E:/RDF/RDF_TEST_CASES/ALL/horst-01 5 E:/RDF/RDF_TEST_CASES/ALL/pfps-10 6 E:/RDF/RDF_TEST_CASES/ALL/rdf-charmod-literals .......
> table.foreach( webGet.dirent("E:/RDF/RDF_TEST_CASES",'sr'), function(_,v) print(v.volume, v.dir, v.name, v.ext) end ) E: /RDF/RDF_TEST_CASES/ALL/amp-in-url/ test001 nt E: /RDF/RDF_TEST_CASES/ALL/amp-in-url/ test001 rdf E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test001 nt E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test001 rdf E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test002 nt E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test002 rdf E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test002b nt ....
The same example with a match string
> table.foreach( webGet.dirent("E:/RDF/RDF_TEST_CASES",'sr','.nt'), function(_,v) print(v.volume, v.dir, v.name, v.ext) end ) E: /RDF/RDF_TEST_CASES/ALL/amp-in-url/ test001 nt E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test001 nt E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test002 nt E: /RDF/RDF_TEST_CASES/ALL/datatypes/ test002b nt ....