Wednesday, September 14, 2016

C+++ Compile, Part 1, Vectors vs Native Arrays in x86-64 bit Debug Builds

Learning to love assembly language

Have you ever looked at the assembly output of your program in a debugger? That assembly is what your program really does, and understanding it is crucial to getting the greatest performance out of your tools. Each short little symbol directly corresponds to an instruction on the CPU. Behind each is a number. And these days, those little numbers can actually be pretty powerful. Some of the things that the new Intel Broadwell or new AMD CPUs do are just remarkable.

"But the compiler will optimize it!" You might say. "I don't need to do this!" Not so fast! C++, and particularly the C it descended from, was a language designed to be a sort of shorthand for assembler. Back in the old days of C you could even say "use this variable in a register", and the compiler would try to oblige. Almost 50 years ago, some genuinely brilliant guys at AT&T decided they wanted to write a space war game, and along the way, they invented not only C, but Unix, to do it. C came about because they wanted something performant but also portable. All this business about pointers is an abstraction very close to how chips actually work - the developer thinks about addresses and arrays and math, and the compiler spits out the exact sequence of instructions in that architecture. So addresses and variables in memory are actually just numbers too and you can do math on them. But because of this, this shorthand heritage, C++ has a lot of rules about what it actually can and can't touch when it optimizes.

Oversat is a good stomping ground for this kind of performance self-education. In SAT solvers, everything is crazy huge and everything needs to be crazy fast. The problem teaches us about both tight code and clever algorithms. The problem is well bounded, and its real world applicable. In the case of OverSAT, the string handling and conflict management has direct applications in big data, from everything in geospatial analysis, to complex financial models and even in joins of in memory databases. At the center of this, understanding how in C++ arrays and vectors work is beneficial. As I studied the problem, a big question emerged.

Question: Does it make a difference which style of looping I choose?

Answer: Yes, it does, in debug.

Viewing the Assembly Language For Your Code

You can view the assembly language for your application in Visual Studio 2015 by the Right Mouse Menu over your code while you are in the debugger at a breakpoint.

Comparing Styles of Looping

In all of the examples, we'll notice a few things. Register use is simplistic, owing to debug build. The code you write is what you get - there's no optimization at all and in fact there is extra code to take your results from registers and put them back into variables so you can step through your code and look at it. You won't find that in release builds, except when C++ contracts require it. Also, you can see in debug build the extra penalty paid for having relocatable code - that is, the OS can move your stuff around when it loads it so it can squeeze your code in with other libraries. There's an extra jump when calling a static method. My guess is that's a jump table for relocation, but I reserve the right to be entirely wrong! Either way, there's no simple call instruction to call a method. It's a jump and then a call, and it's certainly not optimal. Debug is meant to be debug, but it is surprising how slow it really is.

Filling an Integer Array Using a Array Index

In this first example, we fill a simple integer array using an array index and a counter in a loop. The compiler output is vaguely reminiscent of the sort of code that a beginning assembly developer may employ. It's a bit much, code, but it is reasonably quick. You can see instructions there to increment the array index and also see that Intel has a swank instruction of getting an address from a simple array. That particular instruction, lea, has historically been so peppy that you will sometimes see it used for simple math. But we're all exploring 64 bit assembly, together, and that may not be the case today.

Filling an Integer Array Using a Pointer

In this example, we will fill an array using a pointer. If we look at the generated assembly, we can see right away that the use of registers is not optimal. This will be the case with all the debug samples. What I believe is happening is that the compiler always generates code to write the register contents back to the variable as this simplifies viewing variable contents in the debugger. That the optimizer uses registers more aggressively is probably tied to why you generally cannot view variables when looking at release code. Visual C++ probably doesn't have the mechanism needed to tie register graphs to variables for debugging purposes.

Filling an Array Using STL Iterators

Now, let's see how a loop on a vector looks when using standard STL iterators. It's a mess! Look at all that slow code. The begin, end and operator ++ in STL are designed to mimic the "feel" of working with pointer types, except at least in debug, they all resolve to function calls. Instead of a speedy pointer manipulation in a register, you get slow function calls.

Filling an Array Using STL Auto

C++11 has a brilliant new for (auto&v : collection) syntax. This syntax is designed to be a shortcut for the usual invocation of C++ iterators and in that it works brilliantly. But, how good is it for working with vectors. It is interesting to see that there appears to be no debug shortcut for how for auto resolves to calls to STL iterator when dealing with C++ vectors. I believe this due to the C++ standard. Clearly though, we would prefer array implementations to resolve directly to a pointer or array based implementation directly. We shall see how this turns out, once we investigate release builds in a subsequent post.

Filling an Integer Array Using STL Indeces

While the iterators are designed for most kinds of containers, the C++ vector has a random access mode as it is meant to be used in a fashion similar to the standard C [] vector. Calling size() on the vector is a method call, and while it may be optimized away, in debug, assigning it to a temporary produces a positive result. Direct vector access produces the cleanest code of any of the vector iterator methods. The only call that is generated is the array access method. That is the price you pay for bounds checking in debug mode for C++ STL.

Conclusion

In this post, we took a first crack at what 64 bit assembly looks like under the hood in debug builds of a simple set of test cases in C++. These are attached to the Oversat project and we'll post the link so you can download the same out of codeplex and check it out for yourself. Next stop, we'll have a look at what release does. These experiments and others will help guide our thinking for the next round of oversat changes, and even help us with the game we shall build.

Now, tech leaders routinely preach that assembly language is a past thing and dying art, the province of compiler writers that the rest of us need not be concerned with. There is room for performance code. When considering a choice between C++ versus C style implementations. Programming in assembly language is not for everyone. It's even weirder than C. Perhaps for many developers, the old refrain "a compiler can generate better code than you can" might well be true.

But we don't want to be like any developers on this blog! We're looking for an extra edge and want to understand the hardware so that we can tell our compiler how to use it.

Thursday, September 8, 2016

A look at OverSAT X and a passing note about NP Completeness.

Here is a look at OverSAT X running in Windows 10.

The look is designed to be reminiscent of an old school computer. The CRT look is evocative of the 1970s and the look of the push buttons mimics those I have seen on mission control panels for the Apollo program of the 1960s. It's a homage to the era when the problem of satisfiability began to be first understood.

OverSAT X solves CNF files. It will either produce a decision string that satisfies all clauses or find that no such string can be created, with a proof of contradictions showing this is impossible, . At some point, it will also see when it must simply give up. Here, the user selects a CNF file from my collection. I downloaded mine from FSU (but the link is dead), SATLIB.org and also also the SAT Competition.

Once the file is loaded, a user may either single step through the SAT problem, or simply run it. Running the problem runs the solver on the background thread, as I wrote in a previous post.

If a solution exists, the OverSATX will eventually find it. Note though, in this screen shot, there is a bug. The particular CNF being solved had a symbol --1, and we treated that as -1 being a separate name than 1, which is, well, a mistake.

OverSATX works by attempting to build a non-conflicting string of variable assignments. For example, given a system (A or B) and (not A or not B), a decision string cannot contain both A and not A. The solution string could either be A not B, or B not A. If, during the building of this string, OverSATX detects a clause that cannot be satisfied because all of the possible choices would be in conflict with the existing string, the clause is said to be conflicted, and the existing string is the conflict string.

This conflict producing string is analyzed in an attempt to find a shorter string. It turns out that the shorter a conflict string is, the more bits are removed from the search space of the overall problem. However, any decision string that keeps a conflict is useful to have around. OverSAT X is also smart enough to replace larger strings with shorter ones that completely contain the space represented by the longer string. By accumulating conflicts and using new decision strings to avoid them, OverSAT X seeks to ever constrain the search space until it arrives at a solution.

It is interesting then, that the overall search space is constricted at each pass, always and relentlessly by some amount. But by how much?

At the moment, not enough. OverSAT X is a bit of a slow learner and there's holes in how it comes up with that conflict string and its processing of it is far from efficient.

But, as a rule, to formerly determine the performance of OverSAT X in Big O notation, then, we shall some day wish to know how many bits from the solution space are being removed by OverSAT X at each pass, and on average, above all, in the worst case. That hasn't been done yet, and characterizing the search space, and improving that contraction performance, is part of what this particular bit of madness is about.

C++ Background Threads on Windows Universal

Solving a SAT problem can take a long time. For a console application this is not too horrific but in a Windows forms application of any kind, the user interface can and will timeout. The correct way to solve this problem is by putting the processing of the SAT problem on a background thread. The user interface spawns a thread, puts the processing on it, and the thread in turns notifies the user interface as to its progress.

Now, in classic Windows Forms C++, using the SDK, a simple mechanism for using a background thread is create a worker thread using either CreateThread or, to use the threadpool APIs. Now, the Windows SDK is seemingly forgivable when invoking Windows objects on background threads. Many windows user interface calls use SendMessage behind the scenes and SendMessage performs a context switch on your behalf. While this can work, the most effective way to manage thread communications to a user interface is to PostMessage. We allocate a message on the background thread, PostMessage to our recipient Windows, process that message, and then free it, on the Windows thread. Windows has worked this way since Windows 95, but it was really Windows NT finally Windows 2000 where Microsoft put this model altogether.

So much is different in Windows Universal, yet, this basic model remains in the Windows Universal API In Windows 10, using the Windows runtime, this process is similar. The names, however, have been changed to protect the innocent. The good news is, we can actually use a lot of the SDK thread primitives in C++ Windows Store applications. I was able to do both CreateThread and use a CreateMutex and WaitForSingleObject seemingly successfully. However, the matter of PostMessage is quite different. In any case, a good pattern is to use ThreadPool::RunAsync to queue a job to the background thread. Then, the body of the thread uses Dispatcher->RunAsync to essentially post a message back to the Windows thread.

Now, for OverSATX, the solver objects must be accessed in both the foreground thread and the worker thread. The first answer, of course, is to use locks to manage that. IT turns out that our trusty CreateMutex WaitForSingleObject actually works, but locks are messy and increase the risk of bugs, not to mention, undermine performance. Fortunately in OverSATX, it turns out that this is not necessary. Instead, in OverSATX, since I always have the solution state, I let the background thread run for a second, and post the results of its work to the foreground thread. The foreground thread then queues another job to the background thread. This granularity works rather well. Even with the feeble conflict resolution engine of OverSATX, I can for some problems perform between 100-1000 trials on the SAT problem in that second when doing a release build. A future change to the conflict engine should improve that performance dramatically.

How to Get the Current Time in C++ using the Windows Universal Runtime

You can obtain the current time and as follows:

  Windows::Globalization::Calendar^ endTime = ref new Windows::Globalization::Calendar();
  endTime->SetToNow();

Managing running something for a set period of time in C++ in Windows Universal

You can get the current time, add a few seconds to it, then, compare that to the present time in your processing loop


  Windows::Globalization::Calendar^ currentTime = ref new Windows::Globalization::Calendar();
  Windows::Globalization::Calendar^ endTime = ref new Windows::Globalization::Calendar();
  endTime->SetToNow();
  endTime->AddSeconds(1);
  Windows::Foundation::DateTime endDateTime = endTime->GetDateTime();
  sat::sat_solver_strategy strategy;
  bool finished = false;
  while (!finished && IsRunning)
  {
   if (sat_solver)
   {
    finished = sat_solver->step(strategy);
    currentTime->SetToNow();
    if (currentTime->CompareDateTime(endDateTime) > 0 || finished)
    {

Background Thread in C++ Windows Universal

Notice that, in the below, when we are inside the dispatcher, calling the U/I thread back, we're actually a function call that lives on the foreground thread. That is why setting UI elements works there. If we were to alter members of the SAT_Solver at that point, we could actually be introducing a race condition. We live seemingly risky, but safely, handing off the work from one thread to another, as if two ice skaters tossing a ball between them.

void OverSatX::MainPage::RunBackground()
{
 IsRunning = true;
 IsStopped = false;
 btnSATStep->IsEnabled = false;
 btnSATCalc->IsEnabled = false;
 btnCnfLoad->IsEnabled = false;
 btnSATStop->IsEnabled = true;
 SetStatus(L"RUNNING");

 ThreadPool::RunAsync(ref new WorkItemHandler([this](IAsyncAction ^async)
 {
  Windows::Globalization::Calendar^ currentTime = ref new Windows::Globalization::Calendar();
  Windows::Globalization::Calendar^ endTime = ref new Windows::Globalization::Calendar();
  endTime->SetToNow();
  endTime->AddSeconds(1);
  Windows::Foundation::DateTime endDateTime = endTime->GetDateTime();

  sat::sat_solver_strategy strategy;
  bool finished = false;
  while (!finished && IsRunning)
  {
   if (sat_solver)
   {
    finished = sat_solver->step(strategy);
    currentTime->SetToNow();
    if (currentTime->CompareDateTime(endDateTime) > 0 || finished)
    {
     IsRunning = false;
     IsStopped = true;
     Windows::ApplicationModel::Core::CoreApplication::MainView->CoreWindow->Dispatcher->RunAsync(
      Windows::UI::Core::CoreDispatcherPriority::High,
      ref new Windows::UI::Core::DispatchedHandler([this, finished]()
     {
      DrawEverything(finished);
      if (!finished) 
      {
       RunBackground();
      }
     }));
     return;
    }
   }
  }

  if (!finished)
  {
   Windows::ApplicationModel::Core::CoreApplication::MainView->CoreWindow->Dispatcher->RunAsync(
    Windows::UI::Core::CoreDispatcherPriority::High,
    ref new Windows::UI::Core::DispatchedHandler([this, finished]()
   {
    // nasty side effects driven business... but that's why this app is a learning experience.
    btnSATStep->IsEnabled = true;
    btnSATCalc->IsEnabled = true;
    btnCnfLoad->IsEnabled = true;
    btnSATStop->IsEnabled = false;
    SetStatus(L"STOPPED");
   }));
  }

  IsStopped = true;
 }));
}

Monday, September 5, 2016

Homemade Bread

This has been a year of firsts. First, I made my Mrs. a Dining Room table. The design is based on some online plans coupled with a few things we've noticed while walking through stores. The interesting thing about this is, my table has no screws in it whatsoever. Everything was dowel jointed together, and clamped and glued. My dowel jig is Made in the USA and has been invaluable.

Recently, I made my first loaf of homemade bread, ever. The Mrs. recommended to me an untried recipe from a cookbook that's been having luck with, and so I went with it.

It as a bit more complicated though, with the dough required to rise first in a lukewarm oven - set to 200 and then cooling, and then on the counter for another half an our. I had no idea how sticky bread dough is, and it certainly doesn't roll as easy as you might think.

I learned that bread dough is not like cookie dough. I fear my recipe was off slightly due to the temperature of the liquid ingredients being more close to speed rather than accuracy.

I used all purpose flour instead of bread flour, and that worked.

Picking a File and Opening it on Windows 10 in C++.

Welcome to CountryBit. We're exploring making things ourselves. We explore new technology in software, woodworking, home improvement and even cooking as to do not only challenges us, but improves us.

Right now I'm building OverSATX - a SAT Solver for Windows 10. A SAT solver is our take on a fundamental problem of computer science. Given a set of boolean expressions, figure out the variable assignment that makes them all true. There's a lot of interest in this problem because we can make software much more powerful as we get better at solving these problems. Some day, a general solution, could even give us the computing power we need to cure most diseases, and perhaps even achieve immortality. It's a good hobby to work on.

Originally OverSAT was a console application, but, I'm interested in Windows 10 and have it mind to write for it, so I figured I'd start with migrating OverSAT to Windows 10. Then, I can put it into the Windows store, but its open source and you can have a look here. It may not be much, but, the sales pitch is for immortality. Maybe I'll get some gas money for this or enough to get the Mrs. some new dishes.

The challenge today, is changing from a command line C++ application to a Windows Universal wrapper around a C++ program. That part actually proved not to be too bad. I recommend letting the wizard generate the vanilla application and then move C++ code into it. Working with C++ style pointers versus Windows runtime references poses a few minor conversion challenges but ultimately comes out ok.

Some things I have learned:

How to convert a C++ string to a Windows Runtime string.

My file_data class is created by a Platform string. The Data() member str gets you a wide string, and then you have to convert it to UTF8, if that's what you want:

  file_data(Platform::String^ str) 
  {
   auto wdata = str->Data(); 
   using convert_type = std::codecvt_utf8;
   std::wstring_convert converter;
   converted_str = converter.to_bytes(wdata);
   length = converted_str.size();
  }

How to do a file picker in Windows Runtime C++

The file picker is a bit of a head turner for those not used to Windows Universals asynchronous ways. Ironically, JavaScript people will find this familiar. You have to chain a lot of async callbacks but it turns out Microsoft provides a C++ library to do this.

Your .pch file might wind up like this:

#include 
#include 

#include "App.xaml.h"
The key thing is to include ppltasks.h. Now we can write our handler for a button clicked event. When the button is clicked, we will load up a new CNF file. Namespaces in our Windows main.xaml.cpp file:
using namespace OverSatX;

using namespace concurrency;
using namespace Platform;
using namespace Windows::Storage;
using namespace Windows::Storage::Pickers;
using namespace Windows::Storage::Streams;
using namespace Windows::Foundation;
using namespace Windows::Foundation::Collections;
using namespace Windows::System::Threading;
using namespace Windows::UI::Xaml;
using namespace Windows::UI::Xaml::Controls;
using namespace Windows::UI::Xaml::Controls::Primitives;
using namespace Windows::UI::Xaml::Data;
using namespace Windows::UI::Xaml::Input;
using namespace Windows::UI::Xaml::Media;
using namespace Windows::UI::Xaml::Navigation;
And finally, our open handler. We open up a file open picker, which, when the user clicks on something, will hand us a StorageFile. We open that async as well. Once opened, we can then use
void OverSatX::MainPage::btnCnfLoad_Click(Platform::Object^ sender, Windows::UI::Xaml::RoutedEventArgs^ e)
{
 FileOpenPicker^ openPicker = ref new FileOpenPicker();
 openPicker->ViewMode = PickerViewMode::List;
 openPicker->SuggestedStartLocation = PickerLocationId::DocumentsLibrary;
 openPicker->FileTypeFilter->Append(".cnf");

 auto dataReader = std::make_shared(nullptr);

 create_task(openPicker->PickSingleFileAsync())
  .then([this](StorageFile^ file) 
 {
  if (file)
  {
   SetStatus(L"FOUND");
   return file->OpenAsync(FileAccessMode::Read);
  }
  else
  {
   SetStatus(L"CANCELLED");
   throw task_canceled();
  }
 })
 .then([this,dataReader](IRandomAccessStream^ stream)-> IAsyncOperation^
 {
  SetStatus(L"READING");
  *dataReader = ref new DataReader(stream->GetInputStreamAt(0));
  return (*dataReader)->LoadAsync(stream->Size);
 })
 .then([this, dataReader](unsigned int nbytes)
 {
  SetStatus(L"READ");
  textFileContent = (*dataReader)->ReadString(nbytes);
  solver::file_data fd(textFileContent);
  SetStatus(L"PARSE");
  sat_solver = sat_solver_factory.create_problem(fd);
  spDecisionString->Children->Clear();
  spRootConflicts->Children->Clear();
  spNewSimplifiedConflicts->Children->Clear();
  spAllSimplifiedConflicts->Children->Clear();
  DrawProblem();
  this->btnSATStep->IsEnabled = true;
  this->btnSATCalc->IsEnabled = true;
  this->btnSATStop->IsEnabled = false;
  SetStatus(L"LOADED");
 });
}

Hope this helps! A few noteworthy things are - it seems to be relatively ok to call UI objects on these async threads. I'm not sure if the traditional Windows UI must be managed on the STA thread remains in effective, or, if something evil is going behind the scenes. We shall see when we start running long runn calculation threads in the background. I don't trust it!