Chris Bizer

Andreas Schultz

Contents

  1. Intro
  2. Benchmark Dataset
  3. Benchmark Machine
  4. Benchmark Results
    1. Jena TDB
    2. Jena SDB
    3. Sesame (Native)
    4. Virtuoso - Triple Store
    5. D2R Server
    6. Virtuoso - RDF Views
    7. MySQL (SQL)
    8. Virtuoso (SQL)
  5. Store Comparison (Single Client)
  6. Store Comparison (Multiple Clients)
  7. Qualification
  8. Thanks


Document Version: 1.2
Publication Date: 03/23/2009


 

1. Introduction

The Berlin SPARQL Benchmark (BSBM) is a benchmark for comparing the performance of storage systems that expose SPARQL endpoints. Such systems include native RDF stores, Named Graph stores, systems that map relational databases into RDF, and SPARQL wrappers around other kinds of data sources. The benchmark is built around an e-commerce use case, where a set of products is offered by different vendors and consumers have posted reviews about products.

This document presents the results of running the Berlin SPARQL Benchmark against

The stores were benchmarked with datasets ranging from 1,000,000 triples to 100,000,000 triples.

In order to set the SPARQL results into context we also report the results of running the SQL version of the benchmark against two relational database management systems (MySQL 5.1.26 and Virtuoso - RDBMS Version 5.0.10).

 


 

2. Benchmark Dataset

We ran the benchmark using the Triple version and the relational version of the BSMB dataset (benchmark scenario NTR). The benchmark was run for different dataset sizes. The datasets were generated using the BSBM data generator and fulfill the characteristics described in section the BSBM specification.

Details about the benchmark datasets are summarized in the following table:

Number of Triples

1M 25M 100M
Number of Products 2,785 70,812 284,826
Number of Producers 60 1422 5,618
Number of Product Features 4,745 23,833 47,884
Number of Product Types 151 731 2011
Number of Vendors 34 722 2,854
Number of Offers 55,700 1,416,240 5,696,520
Number of Reviewers 1432 36,249 146,054
Number of Reviews 27,850 708,120 2,848,260
Total Number of Instances
92,757
2,258,129
9,034,027
Exact Total Number of Triples 1,000,313 25,000,244 100,000,112
File Size Turtle (unzipped) 86 MB 2.1 GB 8.5 GB

Note: All datasets were generated with the -fc option for forward chaining.

There is a RDF triple and a relational representation of the benchmark datasets. Both representations can be downloaded below:

Download Turtle Representation of the Benchmark Datasets

Download MySQL dump of the Benchmark Datasets Important: Test Driver data for all datasets:
(If you generate the datasets by yourself the Test Driver data is generated automatically in directory "td_data")

        Download Test Driver data

 


 

3. Benchmark Machine

The benchmarks were run on a machine with the following specification:

 


 

4. Benchmark Results

This section reports the results of running the BSBM benchmark against three RDF stores, two relational database to RDF wrappers and two relational database systems.

Test Procedure

The load performance of the systems was measured by loading the Turtle representation of the BSBM datasets into the triple stores and by loading the relational representation in the form of MySQL dumps into the RDBMS behind D2R Server. The loaded datasets were forward chained and contained all rdf:type statements for product types. Thus the systems under test did not have to do any inferencing.

The query performance of the systems was measured by running 500 BSBM query mixes (altogether 12,500 queries) against the systems over the SPARQL protocol. The test driver and the system under test (SUT) were running on the same machine in order to reduce the influence of network latency. In order to measure sustainable performance of the SUTs, a ramp-up period is executed before the actual test runs.

We applied the following test procedure to each store:

  1. Load data into the store.
  2. Shutdown store, clear OS caches, restart store.
  3. Run ramp-up.
  4. Execute single-client test run (500 mixes performance measurement, randomizer seed: 808080)
  5. Execute multiple-client test runs. (2, 4, 8 and 64 clients each 500 query mixes, randomizer seeds: 863528, 888326, 975932, 487411)
  6. Execute test run with reduced query mix. (repeat steps 2 to 5 with reduced query mix and different randomizer seeds)

The different runs use distinct randomizer seeds for choosing query parameters. This ensures that the test driver produces distinctly parameterized queries over all runs and makes it harder for the stores to apply query caching.

For all test runs, we also recorded the performance of the SUTs after the first query mix. The following table gives an overview of the performance increase of the SUTs between the second query mix and the average query mix in steady state.

SUT
1M
25M
100M
Sesame
15.61
3.98
0.75
Jena TDB
3.03
0.52
0.00
Jena SDB
0.97
2.68
0.64
Virtuoso TS
0.47
26.14
46.65
Virtuoso RV
0.15
1.98
100.09
D2R Server
0.67
0.03
0.04
MySQL
26.30
17.37
8.49
Virtuoso SQL
1.03
13.58
247.20

 

An overview of load times for SUTs and the different datasets are given in the following table (in [day:]hh:min:sec):

SUT
1M
25M
100M
Sesame
00:02:59
12:17:05
3:06:27:35
Jena TDB
00:00:49
00:16:53
01:34:14
Jena SDB
00:02:09
04:04:38
1:14:53:08
Virtuoso TS
00:00:23
00:39:24
07:56:47
Virtuoso RV
00:00:34
00:17:15
01:03:53
D2R Server
00:00:06
00:02:03
00:11:45
MySQL
00:00:06
00:02:03
00:11:45
Virtuoso SQL
00:00:34
00: 17:15
01:03:53

4.1 TDB over Joseki


Jena TDB homepage

4.1.1 Configuration

The following changes were made to the default configuration of the software:


4.1.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

1M 25M 100M
00:00:49
00:16:53
01:34:14



4.1.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
494.3
164.8
34.9
Query 2
60.9
50.8
38.2
Query 3
451.3
140.8
28.4
Query 4
428.6
116.3
24.8
Query 5
1.8
0.1
0.04
Query 6
59.5
2.4
0.1
Query 7
188.8
28.1
6.3
Query 8
158.8
27.0
8.4
Query 9
57.2
2.9
0.7
Query 10
428.8
61.7
18.8
Query 11
376.4
45.2
23.9
Query 12
52.6
2.8
0.7

 

4.1.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
4,450
6,752
9,429
8,453
8,664
25M
353
513
694
536
555

 

4.1.5 Result Summaries


4.1.6 Run Logs (detailed information)

 

4.2 SDB over Joseki3


Jena SDB homepage

4.2.1 Configuration

The following changes were made to the default configuration of the software:


4.2.2 Load Time

The table below summarizes the load times Turtle files (in hh:mm:ss) :

1M 25M 100M
00:02:09
04:04:38
1:14:53:08

 

4.2.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
373.8
197.8
12.0
Query 2
49.6
46.9
35.2
Query 3
283.3
150.5
8.0
Query 4
240.2
132.0
7.1
Query 5
18.2
1.0
0.5
Query 6
16.8
0.6
0.1
Query 7
112.4
27.4
1.5
Query 8
134.1
30.5
2.7
Query 9
129.0
9.3
2.4
Query 10
289.4
40.0
2.4
Query 11
350.5
97.4
23.4
Query 12
118.9
9.2
2.5

 

4.2.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
10,421
17,280
23,433
24,959
23,478
25M
968
1,346
1,021
883
927

 

4.2.5 Result Summaries


4.2.6 Run Logs (detailed information)

 

4.3 Sesame (Native) over Tomcat


Sesame homepage

4.3.1 Configuration

Sesame was configured to use the "Native" storage schema. The performance figures for other internal storage schemata will differ from the reported figures.

The following changes were made to the default configuration of the software:

Store Type: Native
Indexes: spoc, posc, opsc
JAVA_OPTS = ... -Xmx6144m ...

 

4.3.2 Load Time

The table below summarizes the load times of theTurtle files (in dd:hh:mm:ss) :

1M 25M 100M
00:02:59
12:17:05
3:06:27:35

 


4.3.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

1M 25M 100M
Query 1
661.8
200.0
14.7
Query 2
251.2
168.3
32.5
Query 3
504.8
139.9
12.8
Query 4
452.5
127.9
10.2
Query 5
29.6
1.7
0.5
Query 6
14.1
0.5
0.1
Query 7
86.6
56.7
1.8
Query 8
297.0
90.3
4.2
Query 9
924.2
128.3
19.1
Query 10
428.6
92.6
2.2
Query 11
652.3
98.3
13.3
Query 12
797.4
349.7
17.7

 

4.3.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
18,094
19,057
16,460
18,295
16,517
25M
1,343
1,485
1,204
1,300
1,271

 

4.3.5 Result Summaries

4.3.6 Run Logs (detailed information)

 

4.4 Virtuoso Open-Source Edition v5.0.10 (Triple Store)


Virtuoso homepage


4.4.1 Configuration

The following changes were made to the default configuration of the software:

MaxCheckpointRemap              = 1000000
NumberOfBuffers = 520000
MaxMemPoolSize = 0
StopCompilerWhenXOverRunTime = 1
None

4.4.2 Load Time

The table below summarizes the load times of theTurtle files (in hh:mm:ss) :

1M 25M 100M
00:00:23
00:39:24
07:56:47

4.4.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
202.3
192.0
132.1
Query 2
47.3
45.6
39.5
Query 3
175.8
165.2
136.0
Query 4
92.1
86.1
53.7
Query 5
76.1
13.6
5.9
Query 6
55.0
2.1
0.5
Query 7
72.0
35.7
4.7
Query 8
115.7
113.0
11.6
Query 9
540.8
532.8
53.1
Query 10
95.1
75.2
7.6
Query 11
361.4
342.0
43.6
Query 12
133.2
128.8
39.4

 

4.4.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
12,360
21,356
32,513
29,448
29,483
25M
4,123
7,610
9,491
5,901
5,400


4.4.5 Result Summaries


4.4.6 Run Logs (detailed information)

4.5 D2R Server 0.6


D2R Server is a relational database to RDF wrapper which rewrites SPARQL queries into SQL queries against an application-specific relational schemata based on a mapping. For the experiment, we used D2R Server together with a MySQL database into which we loaded the relational representation of the benchmark dataset.


4.5.1 Configuration

The following changes were made to the default configuration of the software:

4.5.2 Load Time

The table below summarizes the load times of the SQL-dump (in hh:mm:ss) :

1M 25M 100M
00:00:06
00:02:03
00:11:45

 

4.5.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):

1M 25M 100M
Query 1
198.7
173.3
122.5
Query 2
78.2
75.4
64.1
Query 3
182.2
167.0
128.5
Query 4
106.1
95.6
83.8
Query 5
118.2
30.4
13.6
Query 6
275.0
24.6
6.0
Query 7
81.0
76.4
14.5
Query 8
131.8
128.5
22.0
Query 9
505.6
481.9
164.3
Query 10
224.1
219.9
67.4
Query 11
102.2
99.6
41.2
Query 12
151.0
147.8
90.7


4.5.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
2,828
3,861
3,140
2,960
2,938
25M
140
187
160
146
143


4.5.5 Result Summaries


4.5.6 Run Logs (detailed information)

4.6 Virtuoso Open-Source Edition v5.0.10 (RDF Views)


Virtuoso RDF Views is a relational database to RDF wrapper which rewrites SPARQL queries into SQL queries against an application-specific relational schemata based on a mapping. Virtuoso RDF Views works together with the Virtuoso RDBMS into which we loaded the relational representation of the benchmark dataset.

Virtuoso homepage


4.6.1 Configuration

The following changes were made to the default configuration of the software:

MaxCheckpointRemap              = 1000000
NumberOfBuffers = 520000
MaxMemPoolSize = 0
StopCompilerWhenXOverRunTime = 1
No supplementary indexes.
insert into SYS_SPARQL_HOST (SH_HOST, SH_DEFINES)
        values ('%', 'define sql:describe-mode "SPO" ');

4.6.2 Load Time

The table below summarizes the load times of the SQL dump files (in hh:mm:ss) :

1M 25M 100M
00:00:34
00:17:15
01:03:53

4.6.3Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
327.7
235.6
78.8
Query 2
40.9
35.8
39.8
Query 3
226.2
115.4
56.0
Query 4
224.0
167.2
72.3
Query 5
1.1
0.04
0.01
Query 6
26.2
1.0
0.23
Query 7
122.6
97.2
11.6
Query 8
71.7
62.0
12.3
Query 9
81.5
73.2
33.0
Query 10
218.3
200.4
77.1
Query 11
33.4
1.5
0.4
Query 12
202.9
162.2
169.6

 

4.6.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
17,424
28,985
34,836
32,668
33,339
25M
12,972
22,552
30,387
28,261
28,748


4.6.5 Result Summaries


4.6.6 Run Logs (detailed information)

4.7 MySQL 5.1.26


In oder to set the performance figures of the RDF stores into context and in order to be able to calculate the overhead that is produced by rewriting SPARQL queries into SQL queries, we also ran the SQL version of the benchmark queries against the relational representation of the benchmark dataset on MySQL. The semantics of some queries (9, 11, 12) can not be translated exactly into SQL. Although the corresponding SQL queries give similar results, they are semantically not as complex as the SPARQL queries. Thus the SQL results should just be used for general orientation.

4.7.1 Configuration

The following changes were made to the default configuration of the software:

4.7.2 Load Time

The table below summarizes the load times of the SQL dump files (in hh:mm:ss) :

1M 25M 100M
00:00:06
00:02:03
00:11:45


4.7.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
3,021.1
955.1
475.7
Query 2
4,524.9
3,333.3
3,268.0
Query 3
2,832.9
919.1
458.5
Query 4
2,652.5
919.1
428.1
Query 5
396.4
24.9
7.9
Query 6
163.9
7.2
1.9
Query 7
1,912.0
1,369.9
407.2
Query 8
3,496.5
601.0
62.7
Query 9
4,255.3
2,849.0
1,369.9
Query 10
4,444.4
3,355.7
1,883.2
Query 11
9,174.3
4,366.8
455.8
Query 12
7,246.4
2,570.7
538.5

 

4.7.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
235,066
318,071
472,502
442,282
454,563
25M
18,578
31,093
39,647
40,599
40,470

 

4.7.5 Result Summaries


4.7.6 Run Logs (detailed information)

4.8 Virtuoso Open-Source Edition v5.0.10 (SQL)


In oder to set the performance figures of the RDF stores into context and in order to be able to calculate the overhead that is produced by rewriting SPARQL queries into SQL queries, we also ran the SQL version of the benchmark queries against the relational representation of the benchmark dataset on Virtuoso's relational engine. The semantics of some queries (9, 11, 12) can not be translated exactly into SQL. Although the corresponding SQL queries give similar results, they are semantically not as complex as the SPARQL queries. Thus the SQL results should just be used for general orientation.

Virtuoso homepage


4.8.1 Configuration

The following changes were made to the default configuration of the software:

MaxCheckpointRemap              = 1000000
NumberOfBuffers = 520000
MaxMemPoolSize = 0
StopCompilerWhenXOverRunTime = 1

4.8.2 Load Time

The table below summarizes the load times of the SQL dump files (in hh:mm:ss) :

1M 25M 100M
00:00:34
00: 17:15
01:03:53

4.8.3 Benchmark Query results: QpS (Queries per Second)

The table below summarizes the query throughput for each type of query over all 500 runs (in QpS):


1M 25M 100M
Query 1
1,194.7
833.3
469.9
Query 2
1,592.4
1,455.6
991.1
Query 3
1,078.7
838.2
455.6
Query 4
1,097.7
759.3
443.3
Query 5
410.5
42.6
12.2
Query 6
1,605.1
97.3
21.7
Query 7
831.3
733.1
26.0
Query 8
1,715.3
1,602.6
31.0
Query 9
2,638.5
2,638.5
144.6
Query 10
2,004.0
1,587.3
267.5
Query 11
2,493.8
3,194.9
1,248.4
Query 12
2,801.1
2,985.1
1,524.4

 

4.8.4 Benchmark Overall results: QMpH for the 1M and 25M datasets for all runs

For the 1M and 25M datasets we ran tests with multiple clients (1, 2, 4, 8 and 64 clients). The results are in Query Mixes per Hour (QMpH) meaning that larger numbers are better.

1
2
4
8
64
1M
192,013
199,205
274,796
357,316
306,172
25M
69,585
85,146
135,097
173,665
148,813


4.8.5 Result Summaries


4.8.6 Run Logs (detailed information)

 


 

5. Store Comparison (Single Client)

This section compares the SPARQL query performance of the different stores and puts them into relation to the SQL query performance of MySQL and Virtuoso's relational engine. The SQL performance figures also allow to calculate the overhead that is produced by the relational database to RDF wrappers when rewriting SPARQL queries into SQL queries against the underlying RDBMS.

5.1 Query Mixes per Hour

Running 500 query mixes against the different stores resulted in the following performance numbers (in QMpH). The best performance figure for each dataset size is set bold in the tables. This comparison excludes MySQL (SQL mix) and Virtuoso (SQL mix).


5.1.1 QMpH: Complete Query Mix

The complete query mix is given here.

Sesame Native Jena TDB Jena SDB Virtuoso TS Virtuoso RV D2R Server MySQL SQL Virtuoso SQL
1 M
18,094
4,450
10,421
12,360
17,424
2,828
235,066
192,013
25 M
1,343
353
968
4,123
12,972
140
18,578
69,585
100 M
254
81
211
954
4,407
35
4,991
9,102

 

5.1.2 QMpH: Reduced Query Mix

The reduced query mix consists of the same query sequence as the complete mix but without queries 5 and 6. The two queries were excluded as they alone consumed a large portion of the overall query execution time for bigger dataset sizes.

Sesame Native Jena TDB Jena SDB Virtuoso TS Virtuoso RV D2R Server MySQL SQL Virtuoso SQL
1 M
38,727
15,842
15,692
13,759
18,516
11,520
516,271
219,616
25 M
39,059
1,856
4,877
10,718
17,529
3,780
280,993
195,647
100 M
3,116
459
584
2,166
6,293
1,261
84,797
14,400


5.2 Queries per Second by Query and Dataset Size

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each dataset size is set bold in the tables. For comparison reasons the MySQL and Virtuoso results for the SQL queries are also included in the tables but not considered when determining the best performance figure.

Query 1

1M 25M 100M
Sesame 662 200 15
Jena TDB 494 165 35
Jena SDB 374 198 12
Virtuoso TS 202 192 132
Virtuoso RV 199 173 122
D2R Server 328 236 79
MySQL 3,021 955 476
Virtuoso SQL 1,195 833 470

Query 2

1M 25M 100M
Sesame 251 168 32
Jena TDB 61 51 38
Jena SDB 50 47 35
Virtuoso TS 47 46 39
Virtuoso RV 78 75 64
D2R Server 41 36 40
MySQL 4,525 3,333 3,268
Virtuoso SQL 1,592 1,456 991

Query 3

1M 25M 100M
Sesame 505 140 13
Jena TDB 451 141 28
Jena SDB 283 151 8
Virtuoso TS 176 165 136
Virtuoso RV 182 167 129
D2R Server 226 115 56
MySQL 2,833 919 459
Virtuoso SQL 1,079 838 456

Query 4

1M 25M 100M
Sesame 452 128 10
Jena TDB 429 116 25
Jena SDB 240 132 7
Virtuoso TS 92 86 54
Virtuoso RV 106 96 84
D2R Server 224 167 72
MySQL 2,653 919 428
Virtuoso SQL 1,098 759 443

Query 5

1M 25M 100M
Sesame 29 1.69 0.52
Jena TDB 1 0.13 0.04
Jena SDB 18 1.05 0.46
Virtuoso TS 76 13.59 5.86
Virtuoso RV 118 30.44 13.57
D2R Server 1 0.04 0.01
MySQL 396 24.92 7.92
Virtuoso SQL 410 42.56 12.22

Query 6

1M 25M 100M
Sesame 14 0.53 0.13
Jena TDB 59 2.40 0.09
Jena SDB 16 0.55 0.12
Virtuoso TS 55 2.12 0.50
Virtuoso RV 274 24.59 6.03
D2R Server 26 1.00 0.23
MySQL 163 7.15 1.86
Virtuoso SQL 1,605 97.29 21.67

Query 7

1M 25M 100M
Sesame 87 57 2
Jena TDB 189 28 6
Jena SDB 112 27 2
Virtuoso TS 72 36 5
Virtuoso RV 81 76 15
D2R Server 123 97 12
MySQL 1,912 1,370 407
Virtuoso SQL 831 733 26

Query 8

1M 25M 100M
Sesame 297 90 4
Jena TDB 159 27 8
Jena SDB 134 30 3
Virtuoso TS 116 113 12
Virtuoso RV 132 129 22
D2R Server 72 62 12
MySQL 3,497 601 63
Virtuoso SQL 1,715 1,603 31

Query 9

1M 25M 100M
Sesame 924 128 19
Jena TDB 57 3 1
Jena SDB 129 9 2
Virtuoso TS 541 533 53
Virtuoso RV 506 482 164
D2R Server 81 73 33
MySQL 4,255 2,849 1,370
Virtuoso SQL 2,639 2,639 145

Query 10

1M 25M 100M
Sesame 429 93 2
Jena TDB 429 62 19
Jena SDB 289 40 2
Virtuoso TS 95 75 8
Virtuoso RV 224 220 67
D2R Server 218 200 77
MySQL 4,444 3,356 1,883
Virtuoso SQL 2,004 1,587 267

Query 11

1M 25M 100M
Sesame 652 98 13
Jena TDB 376 45 24
Jena SDB 351 97 23
Virtuoso TS 361 342 44
Virtuoso RV 102 100 41
D2R Server 33 2 0.4
MySQL 9,174 4,367 456
Virtuoso SQL 2,494 3,195 1,248

Query 12

1M 25M 100M
Sesame 797 350 18
Jena TDB 53 3 1
Jena SDB 119 9 2
Virtuoso TS 133 129 39
Virtuoso RV 151 148 91
D2R Server 203 162 170
MySQL 7,246 2,571 539
Virtuoso SQL 2,801 2,985 1,524

 

5.3 Proportions of Queries to the complete Query Mix

To get a better feeling on how strongly every query influences the overall performance, these table include the percentage of the overall time a query is executed.

Note: The numbers are for all queries of one type. The query counts are in written in brackets and can be used to calculate the percentage for a single query of a type.

Query 1 (1)

1M 25M 100M
Sesame
0.76%
0.19%
0.48%
Jena TDB
0.25%
0.06%
0.06%
Jena SDB
0.77%
0.14%
0.49%
Virtuoso TS
1.70%
0.60%
0.20%
Virtuoso RV
2.44%
2.08%
1.00%
D2R Server
0.24%
0.02%
0.01%
MySQL
2.16%
0.54%
0.29%
Virtuoso SQL
4.46%
2.32%
0.54%

Query 2 (6)

1M 25M 100M
Sesame
12.01%
1.33%
1.30%
Jena TDB
12.19%
1.16%
0.35%
Jena SDB
35.00%
3.44%
1.00%
Virtuoso TS
43.58%
15.08%
4.02%
Virtuoso RV
37.14%
28.69%
11.46%
D2R Server
11.51%
0.65%
0.15%
MySQL
8.66%
0.93%
0.25%
Virtuoso SQL
20.10%
7.97%
1.53%

Query 3 (1)

1M 25M 100M
Sesame
1.00%
0.27%
0.55%
Jena TDB
0.27%
0.07%
0.08%
Jena SDB
1.02%
0.18%
0.73%
Virtuoso TS
1.95%
0.69%
0.19%
Virtuoso RV
2.66%
2.16%
0.95%
D2R Server
0.35%
0.03%
0.02%
MySQL
2.30%
0.56%
0.30%
Virtuoso SQL
4.94%
2.31%
0.55%

Query 4 (1)

1M 25M 100M
Sesame
1.11%
0.29%
0.69%
Jena TDB
0.29%
0.08%
0.09%
Jena SDB
1.21%
0.20%
0.82%
Virtuoso TS
3.73%
1.33%
0.49%
Virtuoso RV
4.56%
3.77%
1.46%
D2R Server
0.35%
0.02%
0.01%
MySQL
2.46%
0.56%
0.32%
Virtuoso SQL
4.86%
2.55%
0.57%

Query 5 (1)

1M 25M 100M
Sesame
17.01%
22.09%
13.52%
Jena TDB
68.85%
75.04%
55.70%
Jena SDB
15.91%
25.63%
12.64%
Virtuoso TS
4.51%
8.43%
4.52%
Virtuoso RV
4.10%
11.84%
9.02%
D2R Server
72.48%
92.29%
92.25%
MySQL
16.47%
20.70%
17.50%
Virtuoso SQL
12.99%
45.41%
20.70%

Query 6 (1)

1M 25M 100M
Sesame
35.60%
69.92%
54.84%
Jena TDB
2.08%
4.07%
25.79%
Jena SDB
17.23%
48.63%
47.07%
Virtuoso TS
6.24%
53.90%
53.08%
Virtuoso RV
1.76%
14.66%
20.30%
D2R Server
3.00%
3.88%
4.34%
MySQL
39.84%
72.13%
74.43%
Virtuoso SQL
3.32%
19.87%
11.67%

Query 7 (4)

1M 25M 100M
Sesame
23.22%
2.63%
15.97%
Jena TDB
2.62%
1.39%
1.44%
Jena SDB
10.30%
3.93%
15.57%
Virtuoso TS
19.07%
12.82%
22.63%
Virtuoso RV
23.91%
18.87%
33.72%
D2R Server
2.56%
0.16%
0.34%
MySQL
13.66%
1.51%
1.36%
Virtuoso SQL
25.67%
10.55%
38.89%

Query 8 (2)

1M 25M 100M
Sesame
3.38%
0.83%
3.40%
Jena TDB
1.56%
0.72%
0.53%
Jena SDB
4.32%
1.77%
4.35%
Virtuoso TS
5.94%
2.03%
4.58%
Virtuoso RV
7.35%
5.61%
11.15%
D2R Server
2.19%
0.13%
0.16%
MySQL
3.73%
1.72%
4.42%
Virtuoso SQL
6.22%
2.41%
16.29%

Query 9 (4)

1M 25M 100M
Sesame
2.18%
1.16%
1.48%
Jena TDB
8.64%
13.35%
12.50%
Jena SDB
8.97%
11.56%
9.88%
Virtuoso TS
2.54%
0.86%
1.99%
Virtuoso RV
3.83%
2.99%
2.98%
D2R Server
3.86%
0.21%
0.12%
MySQL
6.14%
0.72%
0.40%
Virtuoso SQL
8.09%
2.93%
6.99%

Query 10 (2)

1M 25M 100M
Sesame
2.35%
0.81%
6.54%
Jena TDB
0.58%
0.32%
0.24%
Jena SDB
2.00%
1.35%
4.82%
Virtuoso TS
7.22%
3.05%
7.00%
Virtuoso RV
4.32%
3.28%
3.63%
D2R Server
0.72%
0.04%
0.03%
MySQL
2.94%
0.31%
0.15%
Virtuoso SQL
5.32%
2.44%
1.89%

Query 11 (1)

1M 25M 100M
Sesame
0.77%
0.38%
0.53%
Jena TDB
0.33%
0.22%
0.09%
Jena SDB
0.83%
0.28%
0.25%
Virtuoso TS
0.95%
0.33%
0.61%
Virtuoso RV
4.74%
3.62%
2.97%
D2R Server
2.35%
2.54%
2.56%
MySQL
0.71%
0.12%
0.30%
Virtuoso SQL
2.14%
0.61%
0.20%

Query 12 (1)

1M 25M 100M
Sesame
0.63%
0.11%
0.40%
Jena TDB
2.35%
3.51%
3.11%
Jena SDB
2.43%
2.92%
2.39%
Virtuoso TS
2.58%
0.89%
0.67%
Virtuoso RV
3.21%
2.44%
1.35%
D2R Server
0.39%
0.02%
0.01%
MySQL
0.90%
0.20%
0.26%
Virtuoso SQL
1.90%
0.65%
0.17%

 

5.4 Queries per Second by Dataset Size and Query

Running 500 query mixes against the different stores lead to the following query throughput for each type of query over all 500 runs (in Queries per Second). The best performance figure for each query is set bold in the tables. For comparison reasons the MySQL and Virtuoso results for the SQL queries are also included in the tables but not considered when determining the best performance figure.

 

1M Triple Dataset

Sesame Native
Jena TDB
Jena SDB
Virtuoso TS
Virtuoso RV
D2R Server
MySQL SQL
Virtuoso SQL
Query 1
662
494
374
202
199
328
3,021
1,195
Query 2
251
61
50
47
78
41
4,525
1,592
Query 3
505
451
283
176
182
226
2,833
1,079
Query 4
452
429
240
92
106
224
2,653
1,098
Query 5
30
2
18
76
118
1
396
411
Query 6
14
60
17
55
275
26
164
1,605
Query 7
87
189
112
72
81
123
1,912
831
Query 8
297
159
134
116
132
72
3,497
1,715
Query 9
924
57
129
541
506
81
4,255
2,639
Query 10
429
429
289
95
224
218
4,444
2,004
Query 11
652
376
351
361
102
33
9,174
2,494
Query 12
797
53
119
133
151
203
7,246
2,801

25M Triple Dataset

Sesame Native
Jena TDB
Jena SDB
Virtuoso TS
Virtuoso RV
D2R Server
MySQL SQL
Virtuoso SQL
Query 1
200
165
198
192
173
236
955
833
Query 2
168
51
47
46
75
36
3,333
1,456
Query 3
140
141
151
165
167
115
919
838
Query 4
128
116
132
86
96
167
919
759
Query 5
2
0.1
1
14
30
0.04
25
43
Query 6
1
2
1
2
25
1
7
97
Query 7
57
28
27
36
76
97
1,370
733
Query 8
90
27
30
113
129
62
601
1,603
Query 9
128
3
9
533
482
73
2,849
2,639
Query 10
93
62
40
75
220
200
3,356
1,587
Query 11
98
45
97
342
100
2
4,367
3,195
Query 12
350
3
9
129
148
162
2,571
2,985

100M Triple Dataset

Sesame Native
Jena TDB
Jena SDB
Virtuoso TS
Virtuoso RV
D2R Server
MySQL SQL
Virtuoso SQL
Query 1
15
35
12
132
122
79
476
470
Query 2
32
38
35
39
64
40
3,268
991
Query 3
13
28
8
136
129
56
459
456
Query 4
10
25
7
54
84
72
428
443
Query 5
0.5
0.04
0.5
5.9
13.6
0.01
7.9
12.2
Query 6
0.1
0.1
0.1
0.5
6.0
0.2
1.9
21.7
Query 7
2
6
2
5
15
12
407
26
Query 8
4
8
3
12
22
12
63
31
Query 9
19
1
2
53
164
33
1,370
145
Query 10
2
19
2
8
67
77
1,883
267
Query 11
13
24
23
44
41
0
456
1,248
Query 12
18
1
2
39
91
170
539
1,524

 


 

6. Store Comparison (Multiple Clients)


In real-world situations there are usually multiple clients working against a SPARQL endpoint. Thus we have also benchmarked how the SUTs reacted to multiple clients simultaneously executing query mixes against the SUTs.

The numbers are the query mixes per hour (QMpH) that were executed by all clients together. Meaning that bigger numbers are better.
For comparison reasons, the MySQL results for the SQL queries are also included in the tables. Note that the query mixes per hour values were extrapolated from the time it took all clients together to execute 500 query mixes (see test procedure).

Dataset Size 1M   Number of clients    
 
1
2
4
8
64
Sesame
18,094
19,057
16,460
18,295
16,517
Jena TDB
4,450
6,752
9,429
8,453
8,664
Jena SDB
10,421
17,280
23,433
24,959
23,478
Virtuoso TS
12,360
21,356
32,513
29,448
29,483
Virtuoso RV
17,424
28,985
34,836
32,668
33,339
D2R Server
2,828
3,861
3,140
2,960
2,938
MySQL
235,066
318,071
472,502
442,282
454,563
Virtuoso SQL
192,013
199,205
274,796
357,316
306,172

 

 

Dataset Size 25M   Number of clients    
 
1
2
4
8
64
Sesame
1,343
1,485
1,204
1,300
1,271
Jena TDB
353
513
694
536
555
Jena SDB
968
1,346
1,021
883
927
Virtuoso TS
4,123
7,610
9,491
5,901
5,400
Virtuoso RV
12,972
22,552
30,387
28,261
28,748
D2R Server
140
187
160
146
143
MySQL
18,578
31,093
39,647
40,599
40,470
Virtuoso SQL
69,585
85,146
135,097
173,665
148,813

 

 


 

7. Qualification

A precondition for comparing the performance of different storage systems, is to verify that all systems work correctly and return the exprected query results.

Thus before we measured the performance of the SUTs, we verified that the SUTs return correct results for the benchmark queries using the BSBM qualification dataset and qualtification tool. For more information about the qualification test please refer to the qualification chapter of the BSBM specification.

7.1 Results

We ran qualification tests for following stores: Sesame, SDB, TDB, Virtuoso Triple Store, Virtuoso RDF View and D2R and all stores passed the test.

 

7.1 Detailed logs

D2R:
SDB:

TDB:

Sesame:
Virtuoso TS:

Virtuoso RV:


 

8. Thanks

Lots of thanks to

Please send comments and feedback about the benchmark to Chris Bizer and Andreas Schultz.