Software Development

If you give a Dev a board game…

From my first lecture on C, I have been tinkering with side projects. I’ve done projects purely for exploration and entertainment, like a text-based adventure games. More recently I’ve done utility projects like a script to correct QIF formatted text. Recently I took on a project of a larger scope.
 
A while back,I read an article about a simulation of Machikoro. It is a ‘city-building game’, with rules that are easy to translate to code. In particular, the idea of using the simulator to ‘evolve’ an optimal strategy for the game captivated me. This was applying Machine-learning to a board game. I figured ‘I could do that’, and got to work. I encountered many distractions and set-backs, including a new baby. But this month I am pleased to admit that I have hit a milestone.
 
To support the ‘evolution’ aspect, I had to be able to run thousands of simulations in a reasonable amount of time. And after a bit over a month of concerted effort, I made it. I took my code from being a collection of classes to a library and simulator able to run 1000 games in 15 seconds.
 
I started back in December with classes to represent the deck of cards, a strategy for play, and a player state. The first step after this was to create a basic AI* to act upon the player state, and a given strategy. Borrowing from the article I had found, I decided to make the strategy more static. The decision logic reduced to constant decisions like ‘always yes’, or ‘always the cheapest available’. Then the AI only needed to use the Strategy to answer queries from the Game.
*Note: I am capitalizing and italicizing Class names for ease of identification.
After the simplified AI was complete, I got to work on the Game, which would simulate a single game. I decided that I wanted to use fluent APIs to instantiate a Game. I spend a good chunk of time to get these write, but it helped to make the main routine clearer. While I developed the Game, I decided to abstract the mechanisms of the game. This allowed me to separate the calculations from the sequence in which they are applied. I extracted the Engine to handle things like calculating which AI if any has won, or how much money this AI gets with this dice roll. Meanwhile the Game can manage whose turn it is, and who rolls the dice.
 
Testing both the Game and the Engine were somewhat arduous, but it was time well spent. I caught numerous bugs, and infinite loops before I ever ran a full simulation. Thankfully the Deck, State, and AI were all similarly tested. But I do wish that I had adhered more tightly to TDD. Instead I was very eager to getting the core functionality working.
 
Once these pieces were in place, I initiated my GitFlow, branching Master, Dev, and a new Feature. After pushing version 1.0 to Git, I started work on a new Feature, multi-game simulation! And while I tinkered with a Simulator, I realized that my fluent APIs had a bug. So I went back to Dev, and produced a Hotfix, which was merged into Master. From there I re-based the Feature, and continued my work.
 
With the Simulator, I needed to initialize a Game, but also to be able to run it N times, without interference from the previous rounds. So I had a two-pronged approach, I would accumulate the results of each game, and I would allow a Game to be reset. Learning from my forebears, I was sure to include randomization of the first-player when I reset. This removed the skewing of First-move advantage from my results. With the core Game working and fluently initialized, I was able to simple inject it into a Simulator to run.
 
The original simulator was able to run 1000 games in around 80 seconds. This performance is alright, but my personal dev box has 8 cores and the Simulator was maxing out just one. So to improve performance , I began to look into Python multi-threading. I found two similar flavors of concurrent operations in Python.
I elected to try Tasks first, as it seemed similar to Microsoft’s Task Parallel Library. Sadly I was not quite right about that. The BatchSimulator’s performance was terrible. For some reason it never used multiple cores. The original time for the BatchSimulator was 150 seconds for 1000 games. While it is likely this was user error, it was enough to discourage me from pursuing Tasks further.
 
So I turned to concurrents. And with concurrents, I had much better luck. In this case I spawned some sub-processes. I created the Coordinator to provide each fork with its own copy of the given Game, and an assigned number of games to run. Then each fork created its own Simulator, and ran the given number of games. Once each Simulator completed, the Coordinator would accumulate the results. After all the forks completed, the coordinator calculates the final statistics. This provides an overall winner. To make this easier, I extracted the SimulationResults class. I then added public methods for merging and calculations. By leveraging sub-processes, and existing code, the Coordinator was able to run at least 1000 games in ~16 seconds. Now I say at least, because the Coordinator divides the games evenly among the sub-processes. So to ensure that at least 1000 games are run, it must round up on the division of games per sub-process. But having more data is never a bad thing.
 
I was able to push and close this Feature recently, and I am very pleased with the progress. I went from single game simulation to rather performant 1000 game simulation in a month. I now have something to show for my ideas and my work. This milestone leaves me at a good break point. I can either continue working on the simulator to pursue the machine-learning angle. Or I can change focus and return to this project later. At the moment, I don’t know what direction I will turn. But I wanted to take a step back and look at what I have accomplished, and share my ‘geeking out’ a bit.
 
If anyone is interested in the source, you can find it here.
Advertisements
Standard
Perspective

‘Code is read more often than it is written’

My Blog has moved! You can find my latest content at daniel.scheufler.io. Please continue reading here.

At first glance, this would seem an obvious statement. And it is in a way. When Python language creator Guido Van Rossum created Python, he did so with this thought in mind. As a result, the culture of Python is partly molded around “readability counts”.

The more I thought that statement, the more I realized the marvel it held. ‘Code is read more often than it is written’. If asked to choose between reading and writing, I would have said the same. And yet I realized now, that much of the code I have read, and some of the code I have written does not show this. I wondered, why did my behavior, and that of my peers, not match what I knew to be true? If we believed that code was read more often, they why is so much of our code so hard to read?

At the core, our behavior remains unchanged because this quote is only an observation. There is not imperative contained in it. Without the means of an imperative, the observation cannot turn into an action. Instead the reader would need to derive ‘Code ought to be easily read’ from ‘Code is read more often than it is written’. I trust most would be equal to the task, given a basic desire to optimize.

When I first discovered this, I did not pay it nearly enough attention. I went blithely on my way. Some time later, during the quiet of a vacation, the thought came storming back. I was left dumbfounded. How could I have not seen it earlier? I realize now, it was because I had not given my self enough time to think. With the lighter load during vacation, I was able to think, and so naturally the thought came.

This moment of serendipity also encouraged other considerations. Specifically, what other imperatives had I missed with casual observations? I quickly realized this is dark territory. It would be difficult to turn every observation into a possible imperative. Worse still, these observations might be biases, leading to bad imperatives. Or they might be too weak to lead to a meaningful imperative.

In all cases, the question remains, what have I missed? I believe, especially in software, that we are caught in a rush to develop, to implement, and to finish. As a result, we do not give ourselves time to ask, ‘Is this the best way?’ Business demands that we move with purpose, and that is a reasonable demand. But for the best results, we need time to consider if we go in a way that will deliver us to the goal we seek. I will continue to look for miss-able observations may turn out to change everything.

Addendum:

While drafting this article, two other examples of ‘observation leading to imperative’ appeared. The first was fictional, from Foundation and Earth by Isaac Asimov. In the book, the protagonist remarks with surprise at the neural interface to a computer. Instead of being an over-the-head affair, it was through the hands. His realization was that humans sense and interact with the world through their hands. I may revisit this in a later branch of this discussion on design.

The second example, sprouted from the first, specifically interaction and design. Recently the IoT movement has brought integration to our homes. In particular, the voice-interaction, such as Amazon’s Echo or Google Now. I observed that these devices extended a natural principle: ‘Humans use their voice to make their wishes known.’

Standard
Software Development, Work Projects

Pretty Good Privacy

computer-1294045_960_720

Shortly after starting with my new company, I began work on a back-end infrastructure project. To be specific, I am working on an inter-process-communication (hereafter IPC) layer. As the project developed, we realized the need to protect our data in transit. This is because we are working with Protect Health Information (hereafter PHI). It would be a disaster if the data became compromised.

So to combat this, we are encrypting the data before it is send through the IPC layer. There are many fine encryption schemes available, but many are difficult to implement. Moreover, it is not enough to just encrypt the data. One cannot continue to use the same key for all applications without risk. Enough messages using the same key, and enough time mean someone could learn it. They would then be free to read all our messages and the possible PHI contained within.

Our brilliant architect suggested that we use Pretty Good Privacy or PGP for short. It is an easy to implement encryption scheme that combines many desirable features. PGP uses a new random key for each message to encrypt the outbound data. This key is itself encrypted by a known private key, and is sent along with the encrypted message.

Since the key is random every time, it is difficult to guess the private key. As a result, one cannot decrypt the public key, thus the message is reasonably safe.

To help explain this, I have crafted a simple example in python code, using a Vigenere Cipher. You can find the entire example project on my GitHub Repo, here. But the core of the example is as follows:

def encodePGP(self,plainMsg): 
# generate random key 

randKey = self._generateRandomKey() 
print("> Internal Random Key: "+randKey) 

# encrypt input with ^ 
cryptographer = Crypto() 
encryptedMsg = cryptographer.encode(randKey,plainMsg) 

# encrypt random key with priv. 
key pubKey = cryptographer.encode(self.privateKey,randKey) 

#return concat encrypted key and input 
return pubKey + "_"+encryptedMsg

For those who prefer, a visual representation of this is available on the Wikipedia page for PGP. The algorithm is as I stated before:

  1. Generate a Random Key for the message
  2. Encrypt the message with the Random key
  3. Encrypt the Random Key with the Private Key, to form the public key
  4. Concatenate the Encrypted Message and Public Key

The code for Decoding is as follows:

def decodePGP(self,concatMsg): 
#parse encrypted pub key, encrypted message 
parsed = concatMsg.split("_") 
pubKey = parsed[0] 
encryptedMsg = parsed[1] 

# decrypt rand key with priv. key 
cryptographer = Crypto() 
randKey = cryptographer.decode(self.privateKey, pubKey)
 
# decrypt message with rand key 
decryptedMsg = cryptographer.decode(randKey,encryptedMsg) 

#return message 
return decryptedMsg

In plain terms the decryption steps are:

  1. Parse the input message to get the Public Key and the Encrypted Message
  2. Decrypt the Public key with the Private key, to form the original Random Key
  3. Use the Random Key to Decrypt the Encrypted Message

Ridiculously simple right?! However, this method can be rendered vulnerable by using a weak encryption method, such as the Vigenere Cipher, as I have. Though,it should be clear that a PGP-Vigenere is stronger that Vigenere alone.

As you can see, with a strong encryption method, PGP adds a significant increase in security. The cost is that it increases the complexity in a limited fashion. Naturally, I will be adding this to my tool kit for future projects! I hope this explanation and example has been helpful. But I admit the diagram on Wikipedia provides a good outline of the PGP scheme. For anyone interested, you can download the example and the Vigenere Cipher implementation here.

Standard
Innovation Fridays, Software Development

Learn C# – Principles II

2p4iLast week, I posted the first part of my Learn C# principles discussion. There I covered those principles which I believe would be less subjective and more widely held. This week, I am delving into some more personal principles which I find have improved my code greatly. However, to begin our discussion this week, I will speak on a concept that I believe many will again agree on.

One thing that almost every new programmer should know is that many problems that they will encounter have already been solved. Of course , as with any system with multiple solutions, some are better than others. The best of these have been codified into a series of Design Patterns produced by the “Gang of Four“. To be completely honest, I wished that I had found out about these far earlier than I did as they would have saved me a lot of frustration in creating some of these patterns for myself. But patterns like the Adapter, or the Facade I find myself using quite frequently. But more importantly they changed how I look at the problems that I am trying to solve. I spend more time thinking of the responsibilities, than of the methods. As a result I have gotten better about following the Single Responsibility Principle, mentioned last week.

In general, I believe that most developers would agree that these patterns are helpful, though some are more esoteric than others. Now seeing as I am instructing the C# course, I thought it would be best to reveal any potential hidden biases I have so that the students at least have a hope of separating what I personally believe to be good, from what is generally held to be good. This is most likely an over-abundance of caution but I truly wish to do right by those who have sought my instruction on the subject.

The first place where I recognized that I might be slightly biases was in my experience with Python. Python is a wonderful language to pick-up and a very powerful, albeit not particularly performant. But while I was learning to speak Python, I came across a peculiar text which espoused some principles for software development that I hold to this day. The text is called the Zen of Python, and goes like this:

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
-Tim Peters

Admittedly this may require some additional explanation. I have found that it is easiest to look at in terms of paired ideas. For example, take lines 2 through 7, which cover such pairs as Explicit/Implicit, Simple/Complex, and Sparse/Dense. All of these are talking about how our code ought to read. Referring back to the text, it is better to Explicitly state what a function does, than to execute implicitly. This is idea of explicit effects is not merely held for python but can also be heard as have no “side-effects”. That is one possible interpretation of this line. But what makes the line so powerful is that it embodies the principle rather than the specific case. Another instance where the principle applied is repeated within the text itself: “Errors should never pass silently. Unless explicitly silenced.”

So, as I have attempted to express there are a couple of larger principles explained nicely here with more specific guidelines towards their implementation. The easiest to see is that a developer ought to be expressive in their code, this theme runs throughout the poem. The second theme is that a developer ought to be pragmatic, which is somewhat shown in Practicality beats Purity, and in “Never is often better than right now.” It is another way of showing YAGNI, or “you aren’t gonna need it”. These two are often held in a delicate tension, on one hand there is the pragmatic solution: Sparse is better than dense; but this is contrasted immediately with the Expressive side, Readability Counts.

Python has been gifted with some humorous supporters throughout the years, including Time Peters, who wrote the Zen. I have found the kernels of truth laid underneath the clever words in several instances of quips about Python. I would heartily encourage every developer to read through them and make their own judgement. But if I can I would draw attention one other Python quip, which I find very humorous and also truthful.

This quip is called “Python vs. Perl according to Yoda” and it goes something like this:

Subject: Python versus Perl: A humorous look

From: larry (funkster@midwinter.com)

Date: 10 Jul 1999 01:45:07 -0700

This has been percolating in the back of my mind for a while.

It's a scene from _The Empire Strikes Back_ reinterpreted to serve

a valuable moral lesson for aspiring programmers.

--

EXTERIOR: DAGOBAH -- DAY

  With Yoda strapped to his back, Luke climbs up one of

  the many thick vines that grow in the swamp until he

  reaches the Dagobah statistics lab. Panting heavily, he

  continues his exercises -- grepping, installing new

  packages, logging in as root, and writing replacements for

  two-year-old shell scripts in Python.

YODA: Code! Yes. A programmer's strength flows from code

  maintainability. But beware of Perl. Terse syntax... more

  than one way to do it... default variables. The dark side

  of code maintainability are they. Easily they flow, quick

  to join you when code you write. If once you start down the

  dark path, forever will it dominate your destiny, consume

  you it will.

LUKE: Is Perl better than Python?

YODA: No... no... no. Quicker, easier, more seductive.

LUKE: But how will I know why Python is better than Perl?

YODA: You will know. When your code you try to read six months

  from now.

To get to the meat of it, Perl, which came before Python, encouraged some bad programming habits, like default variables and others which made it difficult to understand the code. Referring back to the Zen, it was not very expressive, since the syntax is terse.Additionally the “true path” of execution is difficult to determine since one cannot know how code will execute without some significant additional context, like the default variable values. And sadly, as Yoda describes, if a programmer falls into these traps in the early stage of a project, it becomes much more difficult to come back, if not completely impossible.

To cap all this, the script offers a humorous test to determine whether or not you are following good patterns. If you can read your code in six months and know what you were trying to do, then perhaps you have done well. This sits along a similar vein with the Zen’s “If the implementation is easy to explain…”. And I find it gratifying to find the self-consistency of the Python supporters in this. I am happy for both examples in order to see the many facets of the delicate balance between expressive and pragmatic code.

This concludes the principles which I attempt to adhere to in my coding, and the ones which I will be utilizing in the workshop.As I had said before, I am trying to teach the principles first to help the future coding and learning to become easier. Looking back, I realize that some of this desire to teach principles first comes from some bad experiences that I had during my early college development days. As Yoda says, “If once you start down the dark path, forever will it dominate your destiny…” . I am hoping to same the participants this agony, and frustration.

This basically covers the ideals that I hold for my software development and my general understanding of them. As always I thank you for your time, and hope that you learned something! Let me know in the comments!

* – The C# logo was created by DevStickers

//Edits//
11JUN2016 – SpellChecking and Minor Grammar/Readability Refactor
Standard
Software Development

Development Tool: Atom

atom-iconA few years ago, just before I left college, a friend introduced me to a funny little program called Atom. It was billed as a ‘Hackable’ text editor. At the time I thought it was an interesting little toy, and tinkered with it for a while. But since I didn’t find any real use for it at the time, I was satisfied with just tinkering. Over time, as classes became more demanding I kind of left it behind. That is until I found a convincing use-case for just such a program!

Recently, I have picked up Atom again for a personal project with some church buddies of mine. We are working with an Arduino and several external components. Since there are three developers and two or three operating systems between us, I wanted to get a product that we could all use with ease on any system. I settled on Atom after becoming frustrated with the existing Arduino IDE.

Since our project had three developers, we split the responsibilities into three primary areas, and had organized our project files accordingly. However, the Arduino IDE does not support a nested architecture, and instead needs all the files to be present at the highest level. Not wanting to lose the project organization, I started dabbling with Atom and found its support to be far superior to the Arduino IDE for this project.

Of course, nothing is perfect, and Atom does not ship with built in support for the Arduino. Thankfully there are a couple of packages which provide the necessary components for it. They are Platformio, and the language-arduino packages. Now, Platformio did require that we adjust our project architecture so that the compiler could locate all our file, this is a very small change, and allowed us to continue more-or-less un-phased. Furthermore, the Platformio package also supports other boards than the Uno which our project was using.

So, after playing with Atom for a week or so, purely for my Arduino project, I became more familiar with the various features, and I was able to get more comfortable with the shortcuts among other things. After a while, I switched back to one of my python projects, and had a little shell shock. At present, I am using PyCharm, which has severed me well, and has the added benefit that one of its default settings allows the Microsoft Visual Studio shortcuts to be used. It is quite polished, and provides solid support for most anything a developer could want to do in Python. But it’s not very easy to customize, at least not compared to Atom.

O
n the flip side, Atom doesn’t ship with support for running python scripts from the IDE. But it does include some language highlights. Here again, the Package system comes to save the day. With the Script Package, Atom gains the ability to execute both Python and other interpretive languages, like Julia , and can display the feedback via an in-IDE terminal window! Furthermore, with Atom, the error highlighting is fairly descriptive, and will show the developer the breaks for the current document! So by switching between various files in your project you can see the pertinent errors in each file, without having to browse through an exhaustive list contained every file all together! Which, coming from a C++ project, is pretty great!

For a little icing on the cake, Atom also has a fair bit of Git integration. (I should hope so, considering it is Git’s IDE). The projects nicely highlight new, and changed files from the current Git changeset, and the default settings are programmed to reduce the clutter in the project view, by leaving out the various .git files, like the .git-ignore. This is a pleasant feature, which I have enjoyed for my Arduino project.

Overall, Atom is a very impressive program. It can be as simple or as advanced as you need it, and can change with ease to suit your needs, through their robust Package manager! With their wide community support base, I look forward to enjoying Atom for many years to come. For anyone interested in learning more, please check out Atom here!

*- Image borrowed from this
source
.

Standard
Software Development

Development Tool: Jupyter

Recently, one of my colleagues presented a prototype of a new feature that my team was going to implement. To be certain the new feature was fascinating both for its algorithmic complexity, and its significance to our users. However, I was admittedly more caught by the tool he had used to develop and present the prototype. With this tool he was able to set-up a development environment, test data, and was able to demonstrate live, working code for us with ease! This tool was Jupyter.

*Jupyter Logo

I can best describe Jupyer as a web-hosted development and testing environment. The Jupyter application is installed on a server which can then expose multiple notebooks wherein the development can be done. More specifically, these notebooks are where the demonstration data is housed, and the presentation are run. Moreover, each notebook can be hooked up to a different compiler/interpreter to allow development to proceed in multiple languages!

This is profoundly useful, because it allows a prototype to be developed in the easiest language to program in, without having to pay for the overhead of a presentation layer! Thus demonstrating a feature to the PM/PO becomes much easier! Furthermore, when you are presenting to the developers, they can make adjustments to the code which you are presenting and they can witness the change’s effects in real-time!

A Jupyter notebook’s structure is very similar, if not identical, to that of a Mathematica Notebook. In Mathematica, the user creates a notebook, and enters an equation , or series of equations into an entry. Then the computation is carried out for that entry, and the user can proceed to use the results in the next entry. This includes plotting as well as some algorithmic analysis, which is especially useful for complex physics simulations.

In Jupyter, the user enters a series of functions, function calls, or classes into an entry, which can then be employed for later use by future entries. One can execute an algorithm in one step, and plot it in the next, or go on to use the results of the algorithm in another step.

Each entry’s results are calculated based only on the present conditions, so changes to entry 1 might affect entry 5’s results, if entry 5 used entry 1’s results to calculate. But as a benefit, if a mistake was made in entry N, one need only correct that entry, and then re-run the calculations for the entries which follow. Both Mathematica and Jupyter share this behavior.

In a corporate setting Jupyter would excel in several use cases, including the PM/PO and the developer Demonstration. In a non-co-located, or even in a co-located environment, a Jupyter notebook could be set-up to allow many users to interact with prototypes in real-time, allowing developers to review the functioning of the prototype while they might be developing the code in a different location or language.

Alternatively, It could be used to allow the PM to visualize what a new feature’s output will look like given some sample data, without having to ask the developers to run the simulation! This would allow the PM to quickly sort through the accuracy of the algorithm. In this case, a QA could also use the notebook to actively investigate a customer reported error in the algorithm, so long as they have the important data and access to an updated algorithm. This way the QA would not need the entire user project, and all the sensitive information that might contain, which could make reproducing bugs much easier!

Finally, as was the case with my colleagues work, Jupyter can be used as a rapid-prototyping environment. Since the language compiler/interpreter are set with the notebook, and the presentation layer is already handled, the developer is much more free to pursue the real interest, the product algorithm. Since the language is not locked by previous work, the developer would be free to choose whatever language they felt would best suit the project. They could feasibly borrow data from other projects, or even simply generate it within the notebook!

Overall, Jupyter looks to be a very effective tool for sharing the development of algorithms, or other possible calculation intensive features in an accessible way with multiple parties within the organization. It provides a usable interface to both developers and non-developers alike, in an approachable fashion. It provides the ability to modify the experimental data to give the users a more detailed understanding of the prototype. And finally, if it were used to hold the existing algorithms, then it might also allow the PM’s to simulate the program sufficiently to trace bugs related to the customer data, or to the company’s algorithm rather than wasting significant time in the back-and-forth as developers seek to understand the meaning behind the data, and why a particular output is wrong.

For those interested in knowing more, you can find Jupyter at jupyter.org! Thank you for your time, and I hope that you find this tool to be useful in your endeavors!

* The image shown is the Jupyter logo found on the jupyter.org home page.

Standard