The Application Development Experiences of an Enterprise Developer

Social Media

Posted by bsstahl on 2022-11-11 and Filed Under: general 

The implosion of Twitter and my subsequent move to the Fediverse has me reviewing all of my social media activity.

A few of the things I've looked at, and continue to investigate, include:

  • How and why I use each platform
  • How has my activity changed over time
  • What previous statements I've made should be corrected, amended, or otherwise revisited

The revisiting of previous statements will likely happen either on the platform where they originated, or via microblog commentary @bsstahl@fosstodon.org. The rest of the analysis can be found here for everyone's benefit and comment. Of course, all comments, as indicated below, should be directed to my microblog @bsstahl@fosstodon.org.

My platforms:

  • My Blog:
    • How I use it: Long form posts, usually technical in nature, that describe a concept or methodology.
    • Future Plans: I hope to continue to use this platform for a long time and would like to be more active. However, I have said that many times and never been able to keep-up a good cadence.
  • Microblogging @bsstahl@fosstodon.org but previously on Twitter:
    • How I use it:
      • Real-time communication about events such as tech conferences with other attendees
      • Keeping in-touch with friends I met at events, usually without even having to directly interact
      • Asking for input on concepts or ideas on how to do/use a tool or technology
      • Asking for comments on my blog posts or presentations
      • Promoting my or other speakers/writers posts or talks, especially when I attend a talk at a conference
      • Publishing links to the code and slide-decks for my conference talks
      • Publicly whining about problems with commercial products or services
      • Making the occasional bad joke or snarky remark, usually at the expense of some celebrity, athlete or business
      • Posting individual photos of people I know or places I go
    • Future Plans: With the move to the Fediverse, I may try to focus more completely on technology on this platform. Perhaps sports-related stuff should go elsewhere, maybe a photo-blog site like PixelFed
  • Facebook:
    • How I use it:
      • Private to only family members or friends I actually know
      • Posting Photos of family and friends to a limited audience
      • Check-ins to places I'm at for future reference, especially restaurants
      • Posting political commentary and social memes
    • Future Plans: I want a place for this that is not a walled-garden like Facebook. I feel like private communities could be run on Mastodon or other Fediverse servers like PixelFed. There are a few possibilities I'm exploring.
  • Flickr:
    • How I use it:
      • Paid "professional" account where I keep my off-site backup of every digital photo I've ever taken, plus some scanned photos that were "born analog", in full-size
      • A public photostream of my favorite photos that are not family or friends
      • A restricted (to family and friends) photostream of photos of family or friends
      • Hosting of photos for my photoblog sites including GiveEmHellDevils.com
    • Future Plans: Most of this will remain though I may syphon-off specific elements to other, more federated communities. For example, the restricted photostream could move to a PixelFed server.
  • LinkedIn:
    • How I use it: A professional network of people I actually know in the technology space. I don't accept requests from people I have never met, including (especially?) recruiters. If I ever need to find a job again, it will be through referrals from people I know.
    • Future Plans: I'd like to do a better job of posting my appropriate content here, perhaps as links from my blog. Of course, that would require more posts on my blog (see above). Other than that, I don't expect any changes here.
  • YouTube:
    • How I use it:
      • In the past, I used it to post videos of family and friends, though now those are usually posted privately via Flickr or Facebook
      • Most of the time, I post videos of my technical presentations, or other presentations to the local user groups
    • Future Plans: Continue to share videos of technical content
  • Instagram
    • How I use it: To publish photos from my GiveEmHellDevils.com photoblog.
    • Future Plans: I would prefer to move this to a Fediverse service like PixelFed that is not a walled-garden. I may start by adding a second stream using the Fediverse, and see what happens. If things go in the right direction, I may be able to eliminate Instagram.
  • GitHub:
    • How I use it:
      • A public repository of my Open-Source (FOSS) projects and code samples.
      • A public repository of those FOSS projects that I contribute to via Pull-Request (PR)
      • The hosting platform for my Blog Site and my GiveEmHellDevils.com photoblog.
    • Future Plans: No changes expected
  • Azure DevOps
    • How I use it:
      • A private repository of my private code projects
      • A private repository of the source material for my presentation slides
      • A private repository of my many random experiments with code
    • Future Plans: No changes expected
  • Azure Websites
    • How I use it:
      • To publish the individual slide-decks for my presentations as listed on my blog site
    • Future Plans: No changes expected
  • TikTok
    • How I use it: I don't
    • Future Plans: None
Tags: social-media twitter mastodon fediverse 

Identifying the Extraneous Publishing AntiPattern

Posted by bsstahl on 2022-08-08 and Filed Under: development 

What do you do when a dependency of one of your components needs data, ostensibly from your component, that your component doesn't actually need itself?

Let's think about an example. Suppose our problem domain (the big black box in the drawings below) uses some data from 3 different data sources (labeled Source A, B & C in the drawings). There is also a downstream dependency that needs data from the problem domain, as well as from sources B & C. Some of the data required by the downstream dependency are not needed by, or owned by, the problem domain.

There are 2 common implementations discussed now, and 1 slightly less obvious one discussed later in this article. We could:

  1. Pass-through the needed values on the output from our problem domain. This is the default option in many environments.
  2. Force the downstream to take additional dependencies on sources B & C

Note: In the worst of these cases, the data from one or more of these sources is not needed at all in the problem domain.

Option 1 - Increase Stamp Coupling

The most common choice is for the problem domain to publish all data that it is system of record for, as well as passing-through data needed by the downstream dependencies from the other sources. Since we know that a dependency needs the data, we simply provide it as part of the output of the problem domain system.

Coupled Data Feed

Option 1 Advantages

  • The downstream systems only needs to take a dependency on a single data source.

Option 1 Disadvantages

  • Violates the Single Responsibility Principle because the problem domain may need to change for reasons the system doesn't care about. This can occur if a upstream producer adds or changes data, or a downstream consumer needs additional or changed data.
  • The problem domain becomes the de-facto system of record for data it doesn't own. This may cause downstream consumers to be blocked by changes important to the consumers but not the problem domain. It also means that the provenance of the data is obscured from the consumer.
  • Problems incurred by upstream data sources are exposed in the problem domain rather than in the dependent systems, irrespective of where the problem occurs or whether that problem actually impacts the problem domain. That is, the owners of the system in the problem domain become the "one neck to wring" for problems with the data, regardless of whether the problem is theirs, or they even care about that data.

I refer to this option as an implementation of the Extraneous Publishing Antipattern (Thanks to John Nusz for the naming suggestion). When this antipattern is used it will eventually cause significant problems for both the problem domain and its consumers as they evolve independently and the needs of each system change. The problem domain will be stuck with both their own requirements, and the requirements of their dependencies. The dependent systems meanwhile will be stuck waiting for changes in the upstream data provider. These changes will have no priority in that system because the changes are not needed in that domain and are not cared about by that product's ownership.

The relationship between two components created by a shared data contract is known as stamp coupling. Like any form of coupling, we should attempt to minimize it as much as possible between components so that we don't create hard dependencies that reduce our agility.

Option 2 - Multiplicative Dependencies

This option requires each downstream system to take a dependency on every system of record whose data it needs, regardless of what upstream data systems may already be utilizing that data source.

Direct Dependencies

Option 2 Advantages

  • Each system publishes only that information for which it is system of record, along with any necessary identifiers.
  • Each dependency gets its data directly from the system of record without concern for intermediate actors.

Option 2 Disadvantages

  • A combinatorial explosion of dependencies is possible since each system has to take dependencies on every system it needs data from. In some cases, this means that the primary systems will have a huge number of dependencies.

While there is nothing inherently wrong with having a large number of repeated dependencies within the broader system, it can still cause difficulties in managing the various products when the dependency graph starts to get unwieldy. We've seen similar problems in package-management and other dependency models before. However, there is a more common problem when we prematurely optimize our systems. If we optimize prematurely, we can create artifacts that we need to support forever, that create unnecessary complexity. As a result, I tend to use option 2 until the number of dependencies starts to grow. At that point, when the dependency graph starts to get out of control, we should look for another alternative.

Option 3 - Shared Aggregation Feed

Fortunately, there is a third option that may not be immediately apparent. We can get the best of both worlds, and limit the impact of the disadvantages described above, by moving the aggregation of the data to a separate system. In fact, depending on the technologies used, this aggregation may be able to be done using an infrastructure component that is a low-code solution less likely to have reliability concerns.

In this option, each system publishes only the data for which it is system of record, as in option 1 above. However, instead of every system having to take a direct dependency on all of the upstream systems, a separate component is used to create a shared feed that represents the aggregation of the data from all of the sources.

Aggregated Data Feed

Option 3 Advantages

  • Each system publishes only that information for which it is system of record, along with any necessary identifiers.
  • The downstream systems only needs to take a dependency on a single data source.
  • A shared ownership can be arranged for the aggregation source that does not put the burden entirely on a single domain team.

Option 3 Disadvantages

  • The aggregation becomes the de-facto system of record for data it doesn't own, though that fact is anticipated and hopefully planned for. The ownership of this aggregation needs to be well-defined, potentially even shared among the teams that provide data for the aggregation. This still means though that the provenance of the data is obscured from the consumer.
  • Problems incurred by upstream data sources are exposed in the aggregator rather than in the dependent systems, irrespective of where the problem occurs. That is, the owners of the aggregation system become the "one neck to wring" for problems with the data. However, as described above, that ownership can be shared among the teams that own the data sources.

It should be noted that in any case, regardless of implementation, a mechanism for correlating data across the feeds will be required. That is, the entity being described will need either a common identifier, or a way to translate the identifiers from one system to the others so that the system can match the data for the same entities appropriately.

You'll notice that the aggregation system described in this option suffers from some of the same disadvantages as the other two options. The biggest difference however is that the sole purpose of this tool is to provide this aggregation. As a result, we handle all of these drawbacks in a domain that is entirely built for this purpose. Our business services remain focused on our business problems, and we create a special domain for the purpose of this data aggregation, with development processes that serve that purpose. In other words, we avoid expanding the definition of our problem domain to include the data aggregation as well. By maintaining each component's single responsibility in this way, we have the best chance of remaining agile, and not losing velocity due to extraneous concerns like unnecessary data dependencies.

Implementation

There are a number of ways we can perform the aggregation described in option 3. Certain databases such as MongoDb and CosmosDb provide mechanisms that can be used to aggregate multiple data elements. There are also streaming data implementations which include tools for joining multiple streams, such as Apache Kafka's kSQL. In future articles, I will explore some of these methods for minimizing stamp coupling and avoiding the Extraneous Publishing AntiPattern.

Tags: agile antipattern apache-kafka coding-practices coupling data-structures database development ksql microservices 

Troubleshooting Information for Machinelearning-ModelBuilder Issue #1027

Posted by bsstahl on 2021-04-03 and Filed Under: tools 

Update: The issue has been resolved. There was an old version of the Extension installed on failing systems that was causing problems with Visual Studio Extensions. Even though the version of the Extension showed as the correct one, an old version was being used. A reinstall of Visual Studio was needed to fix the problem.

There appears to be a problem with the Preview version of the ModelBuilder tool for Visual Studio. This issue has been logged on GitHub and I am documenting my findings here in the hope that they will provide some insight into the problem. I will update this post when a solution or workaround is found.

I want to be clear that this problem is in a preview version, where problems like this are expected. I don't want the team working on this tooling to think that I am being reproachful of their work in any way. In fact, I want to compliment them and thank them for what is generally an extremely valuable tool.

To reproduce this problem, use this Data File to train an Issue Classification or Text Classification model in the ModelBuilder tool by using the Key column to predict the Value column. The keys have intelligence built into them that are valid predictors of the Value (I didn't design this stuff).

Machines that are unable to complete this task get a error stating Specified label column 'Value' was not found. with a stack trace similar to this.

This process seems to work fine on some machines and not on others. I have a machine that it works on, and one that it fails on, so I will attempt to document the differences here.

The first thing I noticed is that the experience within the tool is VERY DIFFERENT even though it is using the exact same version of the Model Builder.

From the machine that is able to train the model

Scenarios - Functional Machine

From the machine having the failure

Scenarios - Failing Machine

Everything seems to be different. The headline text, the options that can be chosen, and the graphics (or lack thereof). My first reaction when I saw this was to double-check that both machines are actually using the same version of the Model Builder tool.

Verifying the Version of the Tool

Spoiler alert: To the best I am able to verify, both machines are using the same version of the tool.

From the machine that is able to train the model

ModelBuilder Tool Version - Functional Machine

From the machine having the failure

ModelBuilder Tool Version - Failing Machine

My next thought is that I'm not looking at the right thing. Perhaps, ML.NET Model Builder (Preview) is not the correct Extension, or maybe the UI for this Extension is loaded separately from the Extension. I can't be sure, but I can't find anything that suggests this is really the case. Perhaps the dev team can give me some insight here.

Verifying the Region Settings of the Machine

While these versions are clearly the same, it is obvious from the graphics that the machines have different default date formats. Even though there are no dates in this data file, and both machines were using US English, I changed the Region settings of the problem machine to match that of the functional machine. Predictably, this didn't solve the problem.

From the machine that is able to train the model

Region Settings - Functional Machine

From the machine having the failure - Original Settings

Region Settings - Problem Machine

From the machine having the failure - Updated Settings

Updated Region Settings - Problem Machine

Checking the Versions of Visual Studio

The biggest difference between the two machines that I can think of, now that the region settings match, is the exact version & configuration of Visual Studio. Both machines have Visual Studio Enterprise 2019 Preview versions, but the working machine has version 16.9.0 Preview 1.0 while the failing machine uses version 16.10.0 Preview 1.0. You'll have to forgive me for not wanting to "upgrade" my working machine to the latest preview of Visual Studio, just in case that actually is the problem, though I suspect that is not the issue.

From the machine that is able to train the model

Visual Studio Version - Functional Machine

From the machine having the failure

Visual Studio Version - Problem Machine

There are also differences in the installed payloads within Visual Studio between the 2 machines. Files containing information about the installations on each of the machines can be found below. These are the files produced when you click the Copy Info button from the Visual Studio About dialog.

From the machine that is able to train the model

Visual Studio Payloads - Functional Machine

From the machine having the failure

Visual Studio Payloads - Problem Machine

Windows Version

Another set of differences involve the machines themselves and the versions of Windows they are running. Both machines are running Windows 10, but the working machine runs a Pro sku, while the problem machine uses an Enterprise sku. Additionally, the machines have different specs, though they are consistent in that they are both underpowered for what I do. I'm going to have to remedy that.

I've included some of the key information about the machines and their OS installations in the files below. None of it seems particularly probative to me.

From the machine that is able to train the model

System and OS - Functional Machine

From the machine having the failure

System and OS - Problem Machine

Other Things to Check

There are probably quite a number of additional differences I could look at between the 2 machines. Do you have any ideas about what else I could check to give the dev team the tools they need to solve this problem?

Tags: ml modelbuilder 

Committing to Git from an Azure DevOps Pipeline

Posted by bsstahl on 2020-06-17 and Filed Under: tools 

There are occasions, such as when working with static website generators, that you'll want to push some changes made in an Azure DevOps pipeline, back into the source Git repository. This process is simple enough, but since I have struggled to get it configured twice now, I am documenting the process here for your use, and my future use.

Azure DevOps pipelines typically contain two parts, although other configurations are possible. The two standards are:

  • Get sources - gets the information to work with from a Git repository or other source control environment
  • Agent Job - holds the tasks required to complete the pipeline

You'll need to take the following steps to configure the interactions between your source control provider and your pipeline:

  • Configure the Get sources section of the pipeline by selecting your source control provider from the list of options and then choosing the repository from the list within that provider. For most providers, you will need to supply credentials with access to the repository, although the pipeline may already have the basic access it needs to read from an Azure DevOps Repo.

Azure DevOps Sources

  • [optional] Configure an Agent Job to perform any cleanup of the repo necessary. When building a static website, I first delete all files from the target directory (the old static website files) so that only the files that are still needed are included in the final deployment.

Note: for all Agent Job steps that involve scripting, I use the Command Line task which allows me to execute my scripts in one of the native OS shells (Bash on Linux and macOS and cmd.exe on Windows). You could just as easily use the Powershell task, which is cross-platform or any number of other options.

  • Execute your build process. This is the step that generates the new files that will eventually be committed back into the source repository. Each static website generator has their own method for creating the site, see the documentation for your tooling for specifics. You can also execute custom tools or scripts here that modify files in the repository any way you'd like.

  • Execute the commit back to the source repo. This is the money step, where everything that has been done to this point is saved in the repository. As with previous steps, I use the Command Line task to execute the needed commands. My script is shown below. It is written for the Windows cmd.exe shell so commands that start with ECHO are log entries that will be included in the pipeline's execution log to help with troubleshooting and maintenance. This script uses a number of pipeline variables which take the form $(variableName) to make configuration easier. The git.email and git.user variables were defined by me in the Variables section of the pipeline, you will need to either configure those variables yourself, or substitute their values in the script. The Build.SourceVersion and Build.SourceVersionMessage variables are supplied by the pipeline and no action was required on my part to create or enable them.

An interesting thing to note about this script is the git push command. The full command git push origin HEAD:master is required in this case, rather than just a simple git push because, once the files are downloaded into the pipeline repo, the local repository is disconnected from the remote by the pipeline, possibly as a safety measure. We have to tell the local repo to push back to the remote HEAD, or else the push will fail. I suspect there is a way to tell the pipeline not to disconnect the head, but doing things this way, to my knowledge, has no ill-effects and is simple enough that it isn't really worth the effort for me to find out.

ECHO ** Starting "Git config for user: $(git.user)"
git config --global user.email "$(git.email)"
git config --global user.name "$(git.user)"

ECHO ** Starting "Git add..."
git add .

ECHO ** Starting "Git commit..."
git commit -m "Static site rebuild due to commit $(Build.SourceVersion) '$(Build.SourceVersionMessage)'"

ECHO ** Starting "Git push..."
git push origin HEAD:master

ECHO ** Ending Update remote git repo script
  • Finally, we need to give the pipeline user permission to write to the repository. This is where things can get a bit tricky and hard to find. The key configuration area within Azure DevOps can be found by going to the Project Settings.

    • For Azure DevOps repositories, select the Repositories tab and click on the [Projectname] Build Service User in the Users section. This will show a list of permissions that the build service user has. There are 4 that are of interest here: Contribute, Create branch, Create tag and Read. Some of those will probably already be enabled for you. I suspect only the Contribute needs to be added at this time, but I usually make sure all 4 are allowed.

    • For other repositories, things are even more complicated. Accounts need to be configured on the external source control service so that the Build Service User account has the permissions it needs to push to that repository. This may require an SSH configuration, an Auth token, or any number of other mechanisms depending on the source control provider. See that provider's documentation for the specifics.

Update 2022-07-13: There is an additional option that needs to be enabled in the pipeline's Agent Job. ☑ Allow scripts to access the OAuth token This option must be checked so that the script can acquire the permissions to update the repository.

There are other ways to do all of this of course. One idea that intrigues me that I haven't tried yet is to have the build service submit a pull-request to the remote git repo. This would require an additional approval step before the changes are merged into the repo. For static websites where merging into master is the equivalent of publishing the site, this might give me the opportunity to review the built site before it is actually deployed.

Have you tried this pull-request method, or used this kind of technique with an non-Azure DevOps repo? If so, please let me know about it @bsstahl@fosstodon.org.

Tags: ci_cd git azure devops azure devops 

Meta-Abstraction -- You Ain't Gonna Need It!

Posted by bsstahl on 2020-05-18 and Filed Under: development 

When we look at the abstractions in our applications, we should see a description of the capabilities of our applications, not the capabilities of the abstraction

Let’s start this discussion by looking at an example of a simple repository.

public interface IMeetingReadRepository
{
    IEnumerable<Meeting> GetMeetings(DateTime start, DateTime end);
}

It is easy to see the capability being described by this abstraction – any implementation of this interface will have the ability to load a collection of Meeting objects that occur within a given timeframe. There are still some unknown details of the implementation, but the capabilities are described reasonably well.

Now let’s look at a different implementation of the Repository pattern.

public interface IReadRepository<T>
{
    IEnumerable<T> Get(Func<T, bool> predicate);
}

We can still see that something is going to be loaded using this abstraction, we just don’t know what, and we don’t know what criteria will be used.

This 2nd implementation is a more flexible interface. That is, we can use this interface to describe many different repositories that do many different things. All we have described in this interface is that we have the ability to create something that will load an entity. In other words, we have described our abstraction but said very little about the capabilities of the application itself. In this case, we have to look at a specific implementation to see what it loads, but we still have no idea what criteria can be used to load it.

public class MeetingReadRepository : IReadRepository<Meeting>
{
    IEnumerable<Meeting> Get(Func<Meeting, bool> predicate);
}

We could extend this class with a method that specifically loads meetings by start and end date, but then that method is not on the abstraction so it cannot be used without leaking the details of the implementation to the application.  The only way to implement this pattern in a way that uses the generic interface, but still fully describes the capabilities of the application is to use both methods described above. That is, we implement the specific repository, using the generic repository – layering abstraction on top of abstraction, as shown below.

public interface IMeetingReadRepository : IReadRepository<Meeting>
{
    IEnumerable<Meeting> GetMeetings(DateTime start, DateTime end);
}

public class MeetingReadRepository : IMeetingReadRepository
{
    IEnumerable<Meeting> GetMeetings(DateTime start, DateTime end)
        => Get(m => m.Start >= start && m.Start < end)

    // TODO: Implement
    IEnumerable<Meeting> Get(Func<Meeting, bool> predicate)
        => throw new NotImplementedException();
}

Is this worth the added complexity? It seems to me that as application developers we should be concerned about describing and building our applications in the simplest, most maintainable and extensible way possible. To do so, we need seams in our applications in the form of abstractions. However, we generally do not need to build frameworks on which we build those abstractions. Framework creation is an entirely other topic with an entirely different set of concerns.

I think it is easy to see how quickly things can get overly-complex when we start building abstractions on top of our own abstractions in our applications. Using Microsoft or 3rd party frameworks is fine when appropriate, but there is generally no need to build your own frameworks, especially within your applications. In the vast majority of cases, YAGNI.

Did I miss something here? Do you have a situation where you feel it is worth it to build a framework, or even part of a framework, within your applications. Please let me know about it @bsstahl@fosstodon.org.

Tags: abstraction apps coding practices development entity flexibility framework generics principle yagni interface 

South Florida Code Camp 2019

Posted by bsstahl on 2019-03-03 and Filed Under: event 

Thanks again to all the organizers, speakers and attendees of the 2019 South Florida Code Camp. As always, it was an amazing and fun experience.

The slides for my presentation are online Intro to WebAssembly and Blazor and the Blazor Chutes & Ladders Simulation sample code can be found in my AIDemos GitHub Repo.

Tags: assembly blazor code camp code sample development framework introduction microsoft presentation 

Three Awesome Months

Posted by bsstahl on 2019-02-26 and Filed Under: event 

The next few months are going to be absolutely amazing. We've got some great events coming up in March and April right here in the Valley of the Sun. In addition, I currently have 4 conferences scheduled in 4 different countries on 2 continents.

AZGiveCamp IX - Presented by Quicken Loans - March 8th-10th

The most important occasion coming up is the 9th AZGiveCamp Hackathon of Help. This year, we're very fortunate to have Quicken Loans presenting our event and hosting it at their new facility in downtown Phoenix. At AZGiveCamp, Arizona's finest technologists will put their skills to work creating software for some great local charity organizations. We help them help our community by using our skills to create tools that help them further their mission.

Visual Studio 2019 Arizona Launch - April 16th

Another fun event for developers in the valley is the Visual Studio 2019 Arizona Launch event being hosted at Galvanize. We'll have some great speakers talking about how Visual Studio 2019 is a more productive, modern, and innovative environment for building software.

Around the World

In March, I'll be visiting opposite ends of the east coast of North America.

First, on March 2nd, I'll be attending the always amazing South Florida Code Camp in Fort Lauderdale.  This event is right up there with the biggest community conferences in the country and is always worth attending. This will be the 7th year I've presented at SoFlaCC. If you're in the area I hope you'll attend.

Later in March, I cross the border into Canada to attend ConFoo Montreal. This will be my first trip ever to Montreal so I hope the March weather is kind to this 35 year Phoenix resident.  The event runs from March 13th - 15th and there will be 2 Canadiens games during the time I am there so I should be able to get to at least one of them.

In May I get to do a short tour of Europe, spending 2 weeks at conferences in Budapest, Hungary (Craft Conference), and Marbella, Spain (J on the Beach).  While I have done some traveling in Europe before, I have never been to Spain or Hungary so I am really looking forward to experiencing the history and culture that these two cities have to offer.

Keep up With Me

I maintain a list of my presentations, both past and upcoming, on the Community Speaker page of this blog. I also try to document my conference experiences @bsstahl@fosstodon.org. If you are going to be attending any of these events, please be sure to ping me and let me know.

Tags: azgivecamp charity code camp conference givecamp microsoft nonprofit phoenix presentation schedule speaking user group visual studio 

The Value of Flexibility

Posted by bsstahl on 2019-02-14 and Filed Under: development 

Have you ever experienced that feeling you get when you need to extend an existing system and there is an extension point that is exactly what you need to build on?

For example, suppose I get a request to extend a system so that an additional action is taken whenever a new user signs-up.  The system already has an event message that is published whenever a new user signs-up that contains all of the information I need for the new functionality.  All I have to do is subscribe a new microservice to this event message, and have that service take the new action whenever it receives a message. Boom! Done.

Now think about the converse. The many situations we’ve all experienced where there is no extension point. Or maybe there is an extension mechanism in place but it isn’t quite right; perhaps an event that doesn’t fire on exactly the situation you need, or doesn’t contain the data you require for your use case and you have to build an entirely new data support mechanism to get access to the bits you need.

The cost to “go live” is only a small percentage of the lifetime total cost of ownership. – Andy Kyte for Gartner Research, 30 March 2010

There are some conflicting principles at work here, but for me, these situations expose the critical importance of flexibility and extensibility in our application architectures.  After all, maintenance and extension are the two greatest costs in a typical application’s life-cycle. I don’t want to build things that I don’t yet need because the likelihood is that I will never need them (see YAGNI). However, I don’t want to preclude myself from building things in the future by making decisions that cripple flexibility. I certainly don’t want to have to do a full system redesign ever time I get a new requirement.

For me, this leads to a principle that I like to follow:

I value Flexibility over Optimization

As with the principles described in the Agile Manifesto that this is modeled after, this does not eliminate the item on the right in favor of the item on the left, it merely states that the item on the left is valued more highly.  This makes a ton of sense to me in this case because it is much easier to scale an application by adding instances, especially in these heady days of cloud computing, than it is to modify and extend it. I cannot add a feature by adding another instance of a service, but I can certainly overcome a minor or even moderate inefficiency by doing so. Of course, there is a cost to that as well, but typically that cost is far lower, especially in the short term, than the cost of maintenance and extension.

So, how does this manifest (see what I did there?) in practical terms?

For me, it means that I allow seams in my applications that I may not have a functional use for just yet. I may not build anything on those seams, but they exist and are available for use as needed. These include:

  • Separating the tiers of my applications for loose-coupling using the Strategy and Repository patterns
  • Publishing events in event-driven systems whenever it makes sense, regardless of the number of subscriptions to that event when it is created
  • Including all significant data in event messages rather than just keys

There are, of course, dangers here as well. It can be easy to fire events whenever we would generally issue a logging message.  Events should be limited to those in the problem domain (Domain Events), not application events. We can also reach a level of absurdity with the weight of each message. As with all things, a balance needs to be struck. In determining that balance, I value Flexibility over Optimization whenever it is reasonable and possible to do so.

Do you feel differently? If so, let me know @bsstahl@fosstodon.org.

Tags: abstraction agile coding practices microservices optimization pattern principle flexibility yagni event driven 

Back to Basics–the Double Data Type

Posted by bsstahl on 2019-02-12 and Filed Under: development 

What is the result of converting a value that is close to, but not at, the maximum value of an Int64 from a double to a long (Int64)?  That is, what would be the result of an expression like:

(long)((double)(Int64.MaxValue – 1))

  1. 9223372036854775806 (263-2, the correct value numerically)
  2. -9223372036854775808 or another obviously incorrect value
  3. OverflowException
  4. Any of the above

Based on the framing of the question it is probably clear that the correct answer is "D". It is possible, depending on the hardware details and current state of your system, for any of the 3 possible outcomes.  Why is this and what can we do to be sure that the results of our floating-point operations are what we expect them to be?

Before we go into the ways we can modify the behavior of our operations, let's take a look at the two data types in question, Int64 and Double.

An Int64 value, also known as a long, is a fairly straightforward storage mechanism that uses 63 bits for the value and 1 bit to represent the sign.  Negative numbers are stored in twos-complement form to make mathematical operations simpler.  The result is that the Int64 type can store, with perfect fidelity, any integral value between -9223372036854775808 and 9223372036854775807.

The Double data type on the other hand is far more complex. It requires storage for continuous values, not just integers. As a result, the Double data type uses 52 bits to store the mantissa (value), 11 bits to store the exponent (order of magnitude) and the remaining bit of the 64-bit structure to store the sign. Both the exponent and mantissa are shifted by a few bits based on some fairly safe assumptions.  This gives us a range of values for the exponent of -1023 to 1024 and a little more than 52 bits of fidelity in the mantissa.

It is this difference in fidelity; 63 bits for Int64 and roughly 52 bits for Doubles, that can cause us problems when converting between the two types.  As long as the integer value can be stored in less than 52 bits (value < 4503599627370495) values can be converted back and forth between Int64 and Double without any data loss. However, as soon as the values cannot be represented completely in 52 bits, data loss is likely to occur.

To store such a value in a Double data type, the exponent is adjusted higher and the best available value for the mantissa is found.  When converted back to Int64, this value will be rounded automatically by the framework into the closest integer value. This resulting value may, or may not, be exactly the same as the original value.  To see an example of this, execute the following code in your favorite C# environment:

Console.WriteLine((long)9223372036854773765.0);

If your system is like mine, you’ll get an answer that is not the same as the original value. On my system, I get the result 9223372036854773760. It is said that this integer does not “round-trip” since it cannot be converted into a Double and then back to an integer.

To make matters worse, the rounding that is required for this conversion can be unsafe under certain conditions. On my machine, if the values get within 512 of Int64.MaxValue, even though they don’t exceed it, attempting the conversion may result in an invalid result, or an OverflowException. Even performing the operation without overflow checking using the unchecked keyword or compiler switch doesn't improve things since, if done unchecked, any overflow in the operation will result in an incorrect value rather than an exception. I prefer the exception in this kind of situation so I generally keep overflow checking on.

The key takeaway for me is that just checking to make certain that a Double value is less than Int64.MaxValue is not enough to guarantee it will convert without error, and certainly does not guarantee the accuracy of any such conversion. Only integer values below 52 bits can be accurately converted into Int64 values.

It is always best to avoid type conversions if possible, but if you are in a situation where it is necessary to convert from large Double values into Integers, I recommend trying some experiments in your production environment to see what range of values will convert accurately. I also highly recommend including very large integers, approaching or at Int64.MaxValue as test data against any method that accepts Int64 values.  Values that are very large in the negative direction (nearing Int64.MinValue) are also good candidates to be used as test data in these methods.

I’ve attached a number of resources below that I used in my research to produce this article, and to fix the bug I caused doing this kind of conversion.  If you have run into this situation and come up with an interesting way of handling it, or if the results of your conversions are different than mine, please let me know about it @bsstahl@fosstodon.org.

Resources

Tags: type csharp clr data structures 

Developer on Fire

Posted by bsstahl on 2018-12-13 and Filed Under: general 

I was recently interviewed by Dave Rael (@raelyard) for his Developer on Fire Podcast.  I had a great time talking with Dave about a lot of different things, both professional and personal, and got to name-drop just a few of the many people who have been a part of my journey over the years.

I also took the opportunity to talk about a few things that have been on my mind:

I hope you enjoy this interview and find something of value in it. If so, please let me know about it @bsstahl@fosstodon.org.

Developer On Fire

Tags: ai azgivecamp community givecamp podcast blazor webassembly optimization algorithms 

SoCalCodeCamp Slide Decks

Posted by bsstahl on 2018-11-10 and Filed Under: event 

The slide decks for my two talks at SoCalCodeCamp USC from November 10, 2018 are below.

Thanks to all of the organizers and attendees of this always amazing event.

Tags: blazor cloud cloud foundry code camp community conference open source presentation slides speaking wasm webassembly 

AZGiveCamp IX-Save the Date

Posted by bsstahl on 2018-10-31 and Filed Under: event 

March 8th–10th 2019

Code Monkey Duckin-It

Mark your calendars to block-out the weekend of March 8th 2019 for the next AZGiveCamp Hackathon-of-Help. More details will be coming very soon so keep an eye on AZGiveCamp.org and Meetup for all the particulars as soon as they are available.  I’m looking forward to seeing you all at our 9th event, helping those who help our community.

Tags: azgivecamp charity community givecamp nonprofit phoenix 

Intro to WebAssembly Using Blazor

Posted by bsstahl on 2018-09-26 and Filed Under: event 

I will be speaking tonight, 9/26/2018 at the Northwest Valley .NET User Group and tomorrow, 9/27/2018 at the Southeast Valley .NET User Group. I will be speaking on the subject of WebAssembly. The talk will go into what WebAssembly programs look and act like, and how they run, then explore how we as .NET developers can write WebAssembly programs with Microsoft’s experimental platform, Blazor.

Want to run your .NET Standard code directly in the browser on the client-side without the need for transpilers or browser plug-ins? Well, now you can with WebAssembly and Blazor.

WebAssembly (WASM) is the W3C specification that will be used to provide the next generation of development tools for the web and beyond. Blazor is Microsoft's experiment that allows ASP.Net developers to create web pages that do much of the scripting work in C# using WASM.

Come join us as we explore the basics of WebAssembly and how WASM can be used to run existing C# code client side in the browser. You will walk away with an understanding of what WebAssembly and Blazor can do for you and how to immediately get started running your own .NET code in the browser.

The slide deck for these presentations can be found here IntroToWasmAndBlazor-201809.pdf.

Tags: apps community csharp framework html5 introduction microsoft presentation phoenix speaking user group ux wasm webassembly w3c 

Programmers -- Take Responsibility for Your AI’s Output

Posted by bsstahl on 2018-03-16 and Filed Under: development 

plus ça change, plus c'est la même choseThe more that things change, the more they stay the same. – Rush (and others Winking smile )

In 2013 I wrote that programmers needed to take responsibility for the output of their computer programs.  In that article, I advised developers that the output of their system, no matter how “random” or “computer generated”, was still their responsibility. I suggested that we cannot cop out by claiming  that the output of our programs is not our fault simply because we didn’t directly instruct the computer to issue that specific result.

Today, we have a similar problem, only the stakes are much, much, higher.

In the world of 2018, our algorithms are being used in police work and inside other government agencies to know where and when to deploy resources, and to decide who is and isn’t worthy of an opportunity. Our programs are being used in the private sector to make decisions from trading stocks to hiring, sometimes at a scale and speed that puts us all at risk of economic events. These tools are being deployed by information brokers such as Facebook and Google to make predictions about how best to steal the most precious resource we have, our time.  Perhaps scariest of all, these algorithms may be being used to make decisions that have permanent and irreversible results, such as with drone strikes.  We  simply have no way of knowing the full breadth of decisions that AIs are making on our behalf today.  If those algorithms are biased in any way, the decisions made by these programs will be biased, potentially in very serious ways and with serious results.

If we take all available steps to recognize and eliminate the biases in our systems, we can minimize the likelihood of our tools producing output that we did not expect or that violates our principles.

All of the machines used to execute these algorithms are bias-free of course.  A computer has no prejudices and no desires of its own.  However, as we all know, decision-making  tools learn what we teach them.  We cannot completely teach these algorithms free of our own biases.  It simply cannot be done since all of our data is colored by our existing biases.  Perhaps the best known example of bias in our data is in crime data used for policing. If we send police to where there is most often crime, we will be sending them to the same places we’ve sent them in the past, since generally, crime involves having a police office in the location to make an arrest. Thus, any biases we may have had in the past about where to send police officers, will be represented in our data sets about crime.

While we may never be able to eliminate biases completely, there are things that we can do to minimize the impact of the biases we are training into our algorithms.  If we take all available steps to recognize and eliminate the biases in our systems, we can minimize the likelihood of our tools producing output that we did not expect or that violates our principles.

Know that the algorithm is biased

We need to accept the fact that there is no way to create a completely bias-free algorithm.  Any dataset we provide to our tools will inherently have some bias in it.  This is the nature of our world.  We create our datasets based on history and our history, intentionally or not, is full of bias.  All of our perceptions and understandings are colored by our cognitive biases, and the same is true for the data we create as a result of our actions.  By knowing and accepting this fact, that our data is biased, and therefore our algorithms are biased, we take the first step toward neutralizing the impacts of those biases.

Predict the possible biases

We should do everything we can to predict what biases may have crept into our data and how they may impact the decisions the model is making, even if that bias is purely theoretical.  By considering what biases could potentially exist, we can watch for the results of those biases, both in an automated and manual fashion.

Train “fairness” into the model

If a bias is known to be present in the data, or even likely to be present, it can be accounted for by defining what an unbiased outcome might look like and making that a training feature of the algorithm.  If we can reasonably assume that an unbiased algorithm would distribute opportunities among male and female candidates at the same rate as they apply for the opportunity, then we can constrain the model with the expectation that the rate of  accepted male candidates should be within a statistical tolerance of  the rate of male applicants.  That is, if half of the applicants are men then men should receive roughly half of the opportunities.  Of course, it will not be nearly this simple to define fairness for most algorithms, however every effort should be made.

Be Open About What You’ve Built

The more people understand how you’ve examined your data, and the assumptions you’ve made, the more confident they can be that anomalies in the output are not a result of systemic bias. This is the most critical when these decisions have significant consequences to peoples’ lives.  A good example is in prison sentencing. It is unconscionable to me that we allow black-box algorithms to make sentencing decisions on our behalf.  These models should be completely transparent and subject to our analysis and correction.  That they aren’t, but are still being used by our governments, represent a huge breakdown of the system, since these decisions MUST be made with the trust and at the will of the populace.

Build AIs that Provide Insight Into Results (when possible)

Many types of AI models are completely opaque when it comes to how decisions are reached.  This doesn’t mean however that all of our AIs must be complete black-boxes.  It is true that  most of the common machine learning methods such as Deep-Neural-Networks (DNNs) are extremely difficult to analyze.  However, there are other types of models that are much more transparent when it comes to decision making.  Some model types will not be useable on all problems, but when the options exist, transparency should be a strong consideration.

There are also techniques that can be used to make even opaque models more transparent.  For example, a hybrid technique (AI That Can Explain Why & An Example of a Hybrid AI Implementation) can be used to run opaque models iteratively.  This can allow the developer to log key details at specific points in the process, making the decisions much more transparent.  There are also techniques to manipulate the data after a decision is made, to gain insight into the reasons for the decision.

Don’t Give the AI the Codes to the Nukes

Computers should never be allowed to make automated decisions that cannot be reversed by a human if necessary. Decisions like when to attack a target, execute a criminal, vent radioactive waste, or ditch an aircraft are all decisions that require human verification since they cannot be undone if the model has an error or is faced with  a completely unforeseen set of conditions. There are no circumstances where machines should be making such decisions for us without the opportunity for human intervention, and it is up to us, the programmers, to make sure that we don’t give them that capability.

Don’t Build it if it Can’t be Done Ethically

If we are unable to come up with an algorithm that is free from bias, perhaps the situation is not appropriate for an automated decision making process.  Not every situation will warrant an AI solution, and it is very likely that there are decisions that should always be made by a human in totality.  For those situations, a decision support system may be a better solution.

The Burden is Ours

As the creators of automated decision making systems, we have the responsibility to make sure that the decisions they make do not violate our standards or ethics.  We cannot depend on our AIs to make fair and reasonable decisions unless we program them to do so, and programming them to avoid inherent biases requires an awareness and openness that has not always been present.  By taking the steps outlined here to be aware of the dangers and to mitigate it wherever possible, we have a chance of making decisions that we can all be proud of, and have confidence in.

Tags: coding practices development enterprise responsibility testing ai algorithms 

On the Shoulders of Giants

Posted by bsstahl on 2018-03-11 and Filed Under: general 

I recently gave my very first Toastmasters speech. I’m rather proud of it. It certainly didn’t go perfectly but was a good introduction to Toastmasters for me, and a good introduction of me to my Toastmasters club.

For those who aren’t familiar with the process, everyone’s 1st Toastmaster speech is called an Icebreaker and is a way to introduce a new Toastmaster to the other members of the club. In my Icebreaker, I chose to introduce myself to my club by talking about just a few of the people who I feel made important historical contributions that paved my path to today.

The transcript and video of this presentation can be found below.

I like to describe myself as the kind of person who has a list of his favorite physicists and favorite mathematicians. The thought being that just knowing I have such a list tells you everything you really need to know about me. Today I'd like to tell you a little bit more about me, to go a little bit deeper, and tell you about me by telling you about just a few of the people on my list and why I find them so fascinating and so important.

We start in ancient Greece in the 4th century BCE. Democritus of Abdura develops a theory of the composition of matter in the universe that is based on what he calls "atoms". These atoms are physically indivisible, always in motion, and have a lot of empty space in between. He is the first person to develop a theory like this, of the creation of the universe and the existence of the universe in a way that is explainable, that is predictable, that we can understand. As such, may people consider him to be the first scientist. It is this reasoning, that the universe is knowable, that has made all technological advancement that we've had since, possible.

One such advancement came in 1842 so let's jump forward from the first scientist to the first computer programmer. Charles Babbage has created his Analytical Engine, and Ada, Countess of Lovelace, translates an article on using that machine to calculate the Bernoulli numbers which was a well known mathematical sequence. She created notes on this article that describes the inputs and instructions and the states of all the registers of the machine at each point in the process. This, deservingly so, is considered to be the first ever computer program. But more than even creating the first program, Ada Lovelace recognized the capabilities of these machines. She recognized that they could be more than just machines that analyze numbers, they could analyze anything that could be represented by numbers. She predicted that they could be used to compose music, create graphics, and even be usable in scientific experiments. This recognition of the computer as a general purpose tool, rather than just as a fancy calculator, is what made all of society's advancements that were based on computers and computer processing, possible.

There are many other people on my list that I'd like to talk about: Nicola Tesla and Alan Turing; Grace Hopper and Albert Einstein.

But there are really two modern physicists that played a greater role than any in my path to today.  The first of those is Carl Sagan.  Dr. Sagan had the ability to communicate in a very accessible way his almost childlike awe and wonder of the cosmos.  He combined the resources and knowledge of a respected scientist with the eloquence of a teacher and a poet, and made science and scientific education available to an entire generation as it never had been before.

Perhaps the most significant reason though that Carl Sagan has become important to me, especially in the last few years, is that he reminds me, quite powerfully, of number one on my list, my favorite physicist of all, my father Hal Stahl, who passed away on this very day, two years ago. Dad's specialty was optics, he loved to play with light and its properties. He also loved math and its power to explain the concepts in physics.  Like my father I love how math, especially calculus, make the calculations of practical things feasible. So much so, that had I recognized the power of physics combined with calculus, before I learned to make computers do my bidding, my career might have taken a slightly different path.

I hope I have given you a few insights into my worldview through the lens of those I idolize.  I like to think that my list shows the value I place on education, especially STEM. It also shows that I recognize the value of collaboration and understand how much of what we do depends on those who came before us.  Isaac Newton famously stated, "If I have seen farther [than others] it is by standing on the shoulders of giants."  My list of the giants on whose shoulders I stand can be found on social media @bsstahl@fosstodon.org. To me that list represents just a few of the many without whom our work and our world would not be possible.

Video: On the Shoulders Of Giants

Tags: toastmasters introduction speaking 

About the Author

Barry S. Stahl Barry S. Stahl (him/his) - Barry is a .NET Software Engineer who has been creating business solutions for enterprise customers for more than 30 years. Barry is also an Election Integrity Activist, baseball and hockey fan, husband of one genius and father of another, and a 30+ year resident of Phoenix Arizona USA. When Barry is not traveling around the world to speak at Conferences, Code Camps and User Groups or to participate in GiveCamp events, he spends his days as a Solution Architect for Carvana in Tempe AZ and his nights thinking about the next AZGiveCamp event where software creators come together to build websites and apps for some great non-profit organizations.

Barry has started delivering in-person talks again now that numerous mechanisms for protecting our communities from Covid-19 are available. He will, of course, still entertain opportunities to speak online. Please contact him if you would like him to deliver one of his talks at your event, either online or in-person. Refer to his Community Speaker page for available options.

Social Media

Tag Cloud