One for the Embedded Systems Conference in Santa Clara in December and the other for the ARM Conference in the autumn. Slide decks are on the Silver Wolf Wushu web site in the Presentations section. The Embedded Systems Conference presentation was titled “Wind Sweeps Fallen Blossoms” – a reference to a movement in New Frame Road #2 and a movement in Single Saber.
We give three scenarios: inserts of table rows with natural keys (two integers plus a datetime); inserts of table rows with GUIDs as the keys; and archiving which can be thought of as a SQL SELECT of many rows, writing the data to a text file, deleting the rows, and then inserting corresponding rows in a table stored on a disk. We show results using a Microsoft SQL Server 2014 database.
Timings are in seconds (3600 = 1 hour) for approximately 16 million rows.
|Insert natural keys||1 thread||16 threads||32 threads|
|Insert GUID keys||1 thread||16 threads||32 threads|
|Archive||1 thread||16 threads||32 threads|
We would summarize as follows: the Optane was roughly FOUR to SIX times faster than a very good hard disk and ALMOST THREE times faster than a modern solid state drive.
Everyone should obtain an Optane.
We note that GUIDs should not be a designer’s first choice for a key and that 32 threads are very likely to be an over-subscription for many machines, so the measures using 16 threads are probably going to be more commonly encountered in practice.
One time-honored soution is to upgrade the processor. This can get rather complicated if motherboards, memory, fans and power supplies are in play. The Extreme Core I7 (rightmost) has 10 cores (20 threads with HyperThreading) and quad channel DDR4 memory with a huge 25 megabytes of cache. And, as advertised, the new Intel Turbo Boost Max Technology 3.0 does provide more than 10% improvement in single-thread work.
But we really had a input-output problem: the disk was the limiting factor.
There was strident insistence that the Intel Optane (above right) would only work on Windows 10 64-bit systems AND only with with Intel’s seventh (7th) generation Kaby Lake processors. We were told only the H270, Q270, Z270, B250 and Q250 chipsets would work. So we were forced to use an excellent CPU.
What we generally encounter for database as opposed to purely computational work is that without Intel’s HyperThreading (blue bars on the chart above) wall clock time for a quad core processor minimizes at about 10 threads. With HyperThreading (red bars on the chart on the previous page) the minimum is 15% to 20% lower (here about a significant 60 seconds) and there are gains to be realized from scheduling 6 to 8 additional threads.
As the number of threads increases it is reasonable to expect a slight linear increase in the time needed to process a fixed number of transactions.
It seemed to us that with a reasonable CPU, a decent hard disk and some aggressive thread management, we could sort out the incoming devices and capture the sensor measurements. Then Professor Peter Wayne and others at Harvard Medical school e-wrote to tell us it was important to measure even during sitting and standing. The ideal in the WuJi style of meditation we teach is to be motionless. What they had found was that sway – how far and how quickly one’s head moved from an ideal position – was a powerful indicator.
So now we had much less time to do a great deal more work.
Running on an inexpensive quad core laptop (1 GHz processing speed) we obtained the following results (averages) when loading 231,440 sensor measurements
- 712 seconds – debug mode; database on the C drive
- 622 seconds – release mode; database on the C drive
- 682 seconds – debug mode; database on an external drive (USB connection)
- 590 seconds – release mode; database on an external drive
The laptop was largely cleared of any other applications, so the primary contenders for CPU and other resources were Windows 10 and the AVG anti-virus software. Debug is about 15% slower than release, but neither one on either disk is fast enough. We need processing to keep pace with the class.
Running on a very expensive desktop (quad core; an Intel core i7 with 3.4 GHz processing speed) we obtain
- 343 seconds – debug mode; database on the C drive
- 257 seconds – release mode; database on the C drive
Note the increase in velocity and the 25% difference in debug versus release.
For purposes of discussion our SAITO application software in these scenarios has two database tables of interest: one, which we will call Table N, has a natural primary key; the other table, Table G, uses a GUID as the primary key. The measures below are seconds of wall-clock time for 320,000 rows. The figures are an average of five runs – the individual runs did not vary much as we (by intent) “only” had anti-virus and the operating system running.
We needed to be well under 240 seconds so multi-threading was needed.
|Description||1 thread insert||1 thread select||16 thread insert||16 thread select|
|Table N disk (*)||514||292||306||213|
|Table G disk||398||366||212||236|
|Table N Optane||146||96||81||63|
|Table G Optane||116||157||55||77|
* = we measured on a Western Digital Passport Ultra (external), a Maxtor Personal Storage 3100 drive (external), a Seagate SRD0NF2 drive (external), two internal hard disk drives; and an Intel 520 Solid State Drive. The figures above are for a 5400 RPM 500 gigabyte internal drive. Our full statistics have 1,2,4,8, and 16 threads for each of the tables and the six drives.
Database Table Row Keys
SAITO has grown over time – it contains 390 Windows forms, over 150,000 lines of code and the executable (EXE) is approaching 15 megabytes in size. There are well over 100 tables in its database. Historically, the key to a row in a table in a database has been strongly preferred to be a unique value. That has meant database architects have either chosen some natural combination of columns, like sensor ID and timestamp in our case, or used a synthetic value such as an automatically assigned integer. This latter is usually fast, simple to understand, there are plenty of integers, and the contents of the key more or less reflect the order that an associated row was inserted into the table. A second synthetic value is nicknamed a UUID, which is an abbreviation for Universally Unique Identifier. Microsoft’s implementation of these are GUIDs where the G stands for ‘globally’. GUID is a 128-bit value consisting of one group of 8 hexadecimal digits, followed by three groups of 4 hexadecimal digits each, followed by one group of 12 hexadecimal digits. Here’s an example: 6B29FC40-CA47-1067-B31D-00DD010662DA.
The good thing about GUIDs is that they are easy to generate: one might code something like lszMyGUID = GUID.NewGUID.ToString to load a GUID into a string variable called lszMyGUID.
Millions of programming years ago (in the 1990s) disks were getting progressively larger and the MBR scheme for partitioning a disk imposed limits on disk size and on database size. A planet-wide standard called UEFI = Unified Extensible Firmware Interface was agreed to, and Intel developed a new scheme today known as GPT = GUID Partition Table that removed these limits.
It turns out that auto-numbering can have all sorts of subtle problems in a heavily multi-threaded environment. So we like auto-numbering, but only for some tables. Typically, these would be tables where rows are added by humans or at least are added at relatively low velocities.
In our SAITO application tables where we expect intense input-output activity either have natural column combinations or (second choice) GUIDs for the primary key. Among the challenges with GUIDs are they are bulky, they don’t (on purpose) have any meaning and they scatter the data all over the disk. The hub on which the data wheel of SAITO spins would be the daily sensor measurements. These need to be collected quickly and accurately at nearly the speed that they are generated and then eventually archived.
Most aides carry devices like an iPad and some students also have smart phones. The SAITO software has to sort out what a device does and who it belongs to. Class starts with a formal bow and salute, followed by five minutes of sitting meditation and then five minutes of standing meditation. Then several minutes of centuries-old Chen family warm-up exercises, so we had thought we had a comfortable amount of time until the first Tai Chi Chuan set to perform this identification process. Until Professor Peter Wayne and others at Harvard Medical School pointed out it was useful to measure movements during sitting and standing. We’ll see what the upcoming Internet of Things Conference and the Sensors Expo (both in San Jose California in May and June, respectively) showcase in terms of hardware, but we are leaning toward pressure sensors embedded in chair seats and personal foot mats.
The shortest and simplest (and, therefore, the first taught) of the Chen Family style sets is known by the precise but not especially imaginative name of 18 Movements. Once they learn this set, students would perform it twice per class forever. The students can see a canonical video of Grandmaster Chen Zhenglei, who choreographed 18 Movements, either projected on large mirrors or on smart glasses. 16 students times 20 sensors ties several times per second gets to be a lot of measurements to store in a database very quickly. Well over 100,000 sustained database inserts per minute. And we have to extract the raw sensor data from the Internet of Things hub where it is stored.
A Typical Class
On an average day three two-hour long classes each with 16 students, most of whom have autism spectrum disabilities. That means they often have expressive language disabilities (cannot speak), behavioral issues and may have medical challenges like seizures, tachycardia (heart rate suddenly triples) or overheating. Before class starts the teacher places a tub file for each student on each table and checks that necessary clothing and objects are available. Things start when a bus or van arrives and we get a head count of students and their aides from the driver. We use biometrics to check everyone in – we currently use multiple fingertip readers to keep the bottleneck to a minimum.
The SAITO software sends emails to designated parents, schools or other third parties indicating the student did (or did not) arrive.
If we are expecting a guest viewer or teacher there will have been a poster of him or her in view on the way to the practice area. Students have added the habit of touching a portrait of Grandmaster Chen Zhenglei. The significance remains elusive.
Four days a week students dress informally – that usually means a school t-shirt and traditional black pants. Once a week or so students dress in semi-formal black cotton uniforms (leftmost of the images below) for film that will be sent to outside reviewers. If we have a guest, or there is a dress rehearsal or an exhibition, then everyone dresses in full formal silks (center and rightmost of the images below).
11. Return to the Performance Monitor window. When ready to begin logging ReadyBoost activity, just click the green Play icon.
12. After the test interval save the logged data to a file
13. Click the Stop icon
14. Select Performance Monitor in the navigation pane.
15. Click the View Log Data icon
16. When the Performance Monitor Properties dialog box appears, click the Add button.
17. Locate and select your log file, as shown previously
In our case, for a typical slice of time Ready Boost was largely useless as we expected. The configuration was a server that was intended to run SAITO and the Ready Boost drive had Windows, our anti-malware software and the SAITO executable. We prefer to load Windows are infrequently as possible, which means the anti-virus software gets loaded infrequently as well. Similarly, SAITO tends to be kept running, so there would be little get use in optimizing program loads.
Our database was on a hard disk.
That was going to change.