A RDF store allowing transactions and DATALOG rule inference
Copyright © 2010 StrixDB. Freely available under the terms of the StrixDB license.
StrixStore his a disk-based RDF graph database implementing the SPARQL and SPARQL/Update standard. It support transactions with the one writer-multiple readers paradigm. StrixStore integrates Datalog rules inference with the SPARQL query language.
StrixStore could be used as a SPARQL and SPARQL/Update server used with Apache HTTP server (an Apache httpd module is provided). It could also be used as a embedded RDF database if launched from Lua (Lua 5.1 module interface is provided) or launched as a standard Windows DLL from a C/C++/Java (C DLL interface is also provided).
This document focuses on the StrixStore specificities (Lua bindings) and on its different interfaces. It is not intended to be a SPARQL or SPARQL/Update tutorial.
This documentation is based on StrixStore version 1.0
The current version is a beta release and could be
used free of charges for any purpose. The current version is ruled by the terms of the
StrixDB license for
beta releases.
C:\StrixDB\release>lua.exe Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio > require('StrixStore') >
StrixStore provides 3 different API :
Notes:
Copy the file mod_strixdb.so of StrixDB distribution into the modules folder of Apache Server (with standard installation, this folder is C:\Program Files\Apache Software Foundation\Apache2.2\modules ).
You could find more information in document Using StrixDB with Apache Server.
Modify the httpd.conf configuration file of Apache as below:
LoadModule strixdb_module modules/mod_strixdb.so StrixRoot "C:/Program Files/StrixDB/" StrixFilename "D:/RDF/strix.db" StrixDefaultURI "http://mydefault/graph/uri/" <Location /strixdb> SetHandler strix-db-handler </Location>
Explanations : LoadModule says to Apache that we want to use StrixDB, StrixRoot refers to StrixDB installation folder, StrixFilename is the file used by our RDF store, StrixDefaultURI is the default graph URI.
The use of StrixStore from a Lua script could be made from :
More information available in document Using StrixDB with Apache Server.
You could also use StrixStore as an embedded RDF store from a C/C++/Java program. The API is available in the StrixStore.h file. The exported functions are :
test_embed.cpp illustrates use of these functions.
Used with Lua scripting language, you can get the version number with the command:
> assert(require 'StrixStore') > print(rdf._VERSION) 0.94.3
Used has a SPARQL server, you can get the version with the url ?infos as in http://myserver/sparql?infos (supposing apache runs for myserver and sparql has the been defined for StrixDB handler
<Location /sparql> SetHandler strix-db-handler </Location>
All request to the storage are made inside a transaction. If the request failed (for example with a syntax failure in a graph creation or graph update), a rollback is made. Starting from version 0.94, ACID transactions are supported. This means that an error in SPARQL or graph update, an OS failure or a physical error (for example a power failure inside a write transaction) will leave the database in a coherent state. ACID transactions are unfortunately time expensive : each database (graph) modification need a disk write synchronization. StrixDB support delayed commit to reduce the cost of disk synchronization.
StrixDB follow the 1 writer, multiple reader paradigm. Most of the requests need only to read the RDF store : They are made with read rights. Multiple-read transactions could occur together. But only one write transaction is allowed. For this reason, write transactions (transactions modifying the database) could occurs at a given time : write transactions are exclusives.
About concurrency: In all the API (C API, Apache module or Lua module), each call to StrixDB is made inside the thread context. A call could is blocked if : (1) need of a write access and some other transactions are not finished, (2) need of a read access and a write transaction is not finished.
All the API are thread safe.
Starting from version 0.94, delayed commit are supported. When delayed transaction are specified (timeout of commit delay >0), disk synchronization (required when a transaction ends) is not immediately done after a transaction but is made :
By default, delayed commit are disabled.
All the functions loaded with require('StrixStore') are in a table
named rdf.
Print a help summary of all available functions.
Take as argument a table specifying RDF store parameters. Returns true if the database was successfully opened or created. The parameters are :
file=<string> | the database file on disk | REQUIRED |
uri=<string> | the URI of default graph (as specified by SPARQL) | REQUIRED for new database |
initFile=<string> | the Lua script to execute at each start of StrixStore | |
truncate=<boolean> | if true, the database is deleted just before each start. | default= false |
backupFile=<string> | the backup fileName. | |
backupPeriod=<integer> | the backup period in hours. | |
delay=<integer> | delayed transaction commit time in seconds (delayed disk synchronization). | default=0 |
functions=<lua table> | the Lua table specify DLL that could be used in SPARQL FILTER .( details in chapter user functions). | |
Advanced parameters | ||
---|---|---|
poolSize=<integer> | the number of pages cached in the poolSize. Each page has 4K. Big poolSize improves speed but consumes memory. | default=100*1024 |
initIndex=<integer> | Size of the inital index (StrixStore is a database bitmap). | default=8*1024*1024 |
quantum=<integer> | Size of the new allocated quantum (bitmap) when the allocated file is full. | default=512*1024*1024 |
safe=<boolean> | default=false | |
noBuffer=<boolean> | default=false | |
writeThrough=<boolean> | if true, wait disk write acknowledge event for each write transaction (safer but slower with SPARQL/Update transactions). | default=false |
Example:
rdf.open{ file='D:/DATAS/RDF/rdfStore.db', ,uri='http://myURIroot/' ,truncate=true ,backupFile='D:/DATAS/RDF/strix.backup.db' ,backupPeriod=1 ,functions={'C:/Program Files/StrixDB/Plugins/soundex.dll' ,'C:/Program Files/StrixDB/Plugins/stemm.dll'} }
the RDF store will be created as 1 unique file (D:/DATAS/RDF/rdfStore.db).If the database already exists, it will be destroyed (truncate=true). Each hour, a backup of database will occurs and stored as D:/DATAS/RDF/rdfStore.db. The user functions declared in the 2 dll soundex.dll and stemm.dll will be loaded.
Shortcut command used to open a database. This is a shortcut for rdf.open{ file=<string>, uri='http://<hostname>' }
If the database was not already created, this command create a new database. This created database has default parameters and use the hostname as default uri.
Without comment.. close the RDF store.
Returns true if RDF store is open, else returns false.
TODO Returns informations about memory usage.
TODO Returns informations about memory usage.
TODO Returns informations about memory usage.
TODO Returns informations about memory usage.
Takes as first argument a SPARQL query (a Lua string). Without second argument, returns a Lua iterator. Example:
local query = [[PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?r ?name FROM <test/friends/> WHERE { ?r foaf:name ?name}]] for resource,name in rdf.sparql(query) do print(resource, name) end
The variables used in the SELECT of the SPARQL query are bounded to the iterator variable by they order of declaration (and without regard of their name).
Three options could be used for this function:
Works as previous function but each object
Submit the SPARQL/Update query (a Lua string).
The only possible option is 'explain'. With explicit add/retract in query, nothing is to explain. Only the use of WHERE could need a bytecode compilation and produces some results with 'explain' option.
TODO
TODO
Without comment (print help).
returns useful path location.
Examples:
print( rdf.path('loader') ) C:/Program Files/Apache Software Foundation/Apache2.2/bin/
...means that the loader was in the Apache httpd folder.
For Dynamic Lua pages (used with Apache Server), see also the apache.root() function.
If the URL of the file is not local, download the file from internet automatically using HTTP GET and default MS Internet Explorer settings for proxy).
Format is nor required (use of the file extension to decide the format).
Create a graph of the given <URI> (or replace if already present) and put the
datas inside.
The datas MUST have the TURTLE format.
Remove the graph of given URI from the RDF store.
Export the graph of given URI.
If no file is specified, this function use Lua print function. This means
:
If no format is provided, it use the file extension to decide the format (see rdf.graph.import).
If compact=1, export will use the current namespaces (rdf, rdfs, owl, foaf, xsd).
If compact=2, export will create all the namespaces for making the output smaller (but the 2 pass process take more time).
print the graph meta datas of all graph stored in the RDFstore. Example :
> rdf.graph.list() Triples | Time Stamp | URI (source) ---------+----------------------+-------------------- 0 | 2010-11-23T00:36:54Z | http://MyStore/ 374 | 2010-11-23T00:36:54Z | http://MyStore/modeles/ (E:/SOMEGRAPH/RDF/modeles-tome4.ttl) 28 | 2010-11-23T00:36:54Z | http://MyStore/schema/ (E:/SOMEGRAPH/RDF/SOMEGRAPH-schema.ttl)
Meta datas are the number of triples, the time stamp (unix time of last graph update or modification), the URI of the graph (this is not the URL... URL is the RDF command to get the graph), the source of the file.
returns a Lua table of graph meta datas. The provided datas are the same as for the rdf.graph.print command. Example :
> table.foreach(rdf.graph.list(), function(k,v) print('graph=',k) table.foreach(v, print) print() end ) graph= http://MyStore/ source tripleCount 0 blankCount 0 DEFAULT_GRAPH true uri http://MyStore/ timeStamp 2010-11-23T00:36:54Z graph= http://MyStore/modeles/ source E:/SOMEGRAPH/RDF/modeles-tome4.ttl tripleCount 374 blankCount 48 uri http://MyStore/modeles/ timeStamp 2010-11-23T00:36:54Z graph= http://MyStore/schema/ source E:/SOMEGRAPH/RDF/SOMEGRAPH-schema.ttl tripleCount 28 blankCount 0 uri http://MyStore/schema/ timeStamp 2010-11-23T00:36:54Z
Rename the graph of first <URI> with the second <URI>. Destination <URI> must not be an existing graph.
Copy the graph of first <URI> into the second <URI>. Destination <URI> must not be an existing graph.
Remove all triples from graph of given <URI>.
returns true if the graph of given <URI> exists.
print the triples od graph of given <URI>. Has the same result that rdf.graph.export { uri=<URI>,format='triples'}
return true if the graph of first <URI> is equivalent to the graph if second <URI>.
Equivalence is calculated with a graph homomorphism algorithm for all triples using blank nodes.
Avoid using it with graphs having thousand of blank nodes (could take a lot of time).
This function update the graph of the furst <URI1>.
All resource of the graph that are child of <URI2> will be relocated to <URI3>.
Exemple: rdf.graph.relocate(... , 'http://bad/person', 'http://good/people') will change the triplets
http://bad/person/Neron rdfs:label "Emperor Neron"
into
http://good/people/Neron rdfs:label "Emperor Neron"
This function is provided to update multiple graphs inside a same transaction (to avoid semantic inconsistence between graphs if an error occurs).
This function was created before implementation of SPARQL/Update. Use SPARQL/Update is better recommended (even if more complex syntax).
TO DO EXAMPLE
This function was created before implementation of SPARQL/Update. Use SPARQL/Update is better recommended (even if more complex syntax).
TO DO EXAMPLE
This function was created before implementation of SPARQL/Update. Use SPARQL/Update is better recommended (even if more complex syntax).
TO DO EXAMPLE
TO DO
TO DO
TO DO
TO DO
TO DO
TO DO
TO DO
TO DO
TODO