imported>Admin: Last updated at: 2014-03-19 04:23:08 By user: VictorChernov

2014-04-28T02:59:11Z

Last updated at: 2014-03-19 04:23:08 By user: VictorChernov

New page

= [[OntologySummit2014_Hackathon]] - Project: =

== Optimized SPARQL performance management via native API ==

Project roster page: [[OntologySummit2014_Hackathon_ReferenceDataForSPARQLPeformanceBenchmarking]] (this page).

Team lead: [[VictorChernov]] (MSK, UTC+4) vchernov at nitrosbase.com

Event starts 29th of March 2014 14:00 MSK / 10:00 UTC / 03:00 PST all over the world via mikogo.com (the session # will come later)

----

The Goals of the project are

Studying the kinds of queries revealing the advantages of one or another RDF database. The goals imply:

* Selection of a SPARQL subset from SP2Bench
* Formin a dataset and loading it to all triple-stores.
* Implementing measurement aids, testing
* Accurate time measurement, getting min, max, average and median times.
* Reflection on the results, advantages and disadvantages of the triplestores on each selected query.

The following triplestores will be compared:

* [http://virtuoso.openlinksw.com/ Virtuoso]
* [http://stardog.com/ Stardog]
* [http://nitrosbase.com/ [[NitrosBase]]]

The triplestores have the following important advantages:

* Very high performance on demonstrated on sp2bench benchmark
* Linux and Windows versions
* Native API for fast query processing

It is important to use native API for fast query execution. All 3 tools provide native API:

; Virtuoso : Jena, Sesame and Virtuoso ODBC RDF Extensions for SPASQL
; Stardog : the core SNARL (Stardog Native API for the RDF Language) classes and interfaces
; [[NitrosBase]] : C++ and .NET native API

We suppose writing additional codes needed for accurate testing:

* Accurate time measurement;
* Functions for getting min, max, average and median times;
* Functions for getting time of scanning through the whole query result;
* Functions for getting time of retrieving first several records (for example, the first page of web grid);
* Etc.

The following steps are needed for loading test dataset:

* Selecting a data subset from sp2bench benchmark
* Measuring data loading time

'''''Note:'''''
''Data are considered as loaded as soon as the system is ready to perform a simplest search query. This is done to eliminate background processes (eg. indexing).''

We are going to explore the query execution performance by the databases under consideration (Virtuoso, Stardog, [[NitrosBase]]).

The queries should be fairly simple and cover the different techniques, for example:

* search the small range of values
* search the big range of values
* Sorting
* Aggregation
* Several different join queries
* Retrieving part of result
* Retrieving whole result
* etc.

'''''Note:'''''
''During testing each database may allocate a lot of resources, that can affect the performance of other databases. Thats why each test should be stared from system reboot.''

OntologySummit2014 Hackathon ReferenceDataForSPARQLPeformanceBenchmarking - Revision history

imported>Admin: Last updated at: 2014-03-19 04:23:08 By user: VictorChernov