Código Caótico

How I Integrated Hikvision Face Recognition into Our App in Just 2 Weeks

How I Integrated Hikvision Face Recognition into Our App in Just 2 Weeks

How I Integrated Hikvision Face Recognition into Our App in Just 2 Weeks

Have you ever been handed a project by your PM with an impossibly tight deadline, dangling the promise of extra pay at the end of the month? If you nodded along, you’re not alone! But before diving into the technical deets, let’s build some context so that our post is thread-safe.

A Context

First things first, let’s establish some foundational knowledge about me and this project. I provide Software Engineering services for a company that has developed a parking lot app — an application that’s surprisingly robust and efficient given the company’s size. The app boasts a variety of features, including an online portal that serves as the entry point for their tenants, a desktop app that manages access control for the tenants' customers (the end users of the parking lot), and a suite of backend services that ensure real-time communication between the desktop app and the portal whenever needed.

Everything is seamlessly integrated and works perfectly. The portal is primarily used to register new users for specific parking lots and offers a range of subscription options tailored to various needs. Meanwhile, the desktop app handles the heavy lifting — quite literally, as it controls the opening and closing of heavy gate barriers. In addition to that, the desktop app is responsible for printing tickets for non-subscribers using the parking lot, while also shouldering the task of receiving signals from a card reader for subscribers and opening the gate for them.

But how does the desktop app know that a specific card is linked to a specific user? For those who might not find the answer immediately obvious — it’s through a database. However, it’s not the same database the portal uses. When a new subscriber is added in the portal, the backend synchronizers send a "beep-bop" message to all relevant desktop apps, signaling that the cloud database has been updated. This triggers the transfer of data sent via raw sockets in binary format to those apps, ultimately updating their local databases.

You Shall Pass (In 2 Weeks)

One of our tenants, however, faced an abrupt change in their business location. The building they were operating in had upgraded its pedestrian access control to a face recognition device and politely requested the parking lot manager to do the same — within a one-month time frame. Since the system isn't centralized, the tenant was in a rush to get this implemented. They promptly called my PM, requesting a quotation and an estimated timeline for the implementation. All of this unfolded a full week before I even began coding — actually, a week before I even knew about the project itself.

My PM was short on staff, as the parking lot solution hadn’t received any major updates since 2022. No new features were being added, just simple updates and maintenance of legacy code. However, this was a massive update, the kind of work that would typically require at least three developers — resources we simply didn’t have at the time. 

One week later, after exhausting all his attempts to find developers that would accept such a risky task, he turned to me and asked "can you do it?" and I said "yes". That "yes" wasn’t the response I wanted to give. I knew it meant countless hours of effort to meet the deadline. But I had never had the chance to work with face recognition technology before—especially under such a tight deadline. That "yes" wasn’t a confident "I can do it"; it was an eager "I want to work on this."

I forgot to mention that the application was written in C# using the .NET 4.8 framework, and the face recognition device was a Hikvision. Not going to lie, I had my fair share of experience with C# back when I was learning to code, but it’s far from the comfort zone of my beloved C++ and Golang. Plus I hate working on Windows and any associate Microsoft software like Visual Studio. But, well, I promised a production ready solution, so dang it! And so the journey began, with me poring over thousands of lines of code daily, missing dinners with my beautiful wife (and getting rightfully smacked for it), all while attempting a completely new and risky implementation I had no idea how to start. Brace yourselves!

Digest Headers Deez Nuts

Let’s talk about digest headers. If you've worked with them before, you probably already know the joy (read: frustration) they bring to the table. For those unfamiliar, digest headers are a type of HTTP authentication mechanism that adds an extra layer of security to API requests. Hikvision decided to incorporate this feature, meaning I had to navigate through the complexities of generating and validating these headers just to interact with their device. It’s safe to say this wasn’t the most exciting part of the project — but it had to be done.

When it came to handling Hikvision's digest authentication, I realized I needed a dedicated class to save myself from the chaos of writing repetitive code. Thus, DigestAuthFixer was born — a simple, no-frills helper to manage the complexities of HTTP digest authentication. Well, "simple" is relative when you're dealing with things like nonces, realms, and MD5 hashes, but you get the idea.

Once the digest header was built, the next step was ensuring that the request followed the proper flow for digest authentication. Here’s the deal: the first request is always expected to fail with a 401 Unauthorized. That’s not an error — it’s how the server tells you it wants digest authentication. My code would then extract the WWW-Authenticate header from the server’s response, parse out the nonce, realm, and other important variables, and use them to generate a proper digest header. Only after this dance of 401 first, authenticate later would the server accept the request and respond with the data I actually needed.

Why Always a Parser?

It took me a day and a half, countless cups of coffee, and a frustrating dive into Hikvision’s documentation to write my first .NET code capable of connecting to the event data stream on one of Hikvision’s endpoints. Once connected, I promptly registered my face and started observing the stream in action as it processed both registered and non-registered faces. The HTTP stream spewed a mix of random JSON data and binary responses for every activity on the device. After some time, I spotted a structured format for recognized faces, something that could actually be parsed! And just like that, I was off to tackle the next challenge: building a parser.

Writing a JSON parser isn’t particularly challenging if you’re only interested in a specific JSON structure and can safely ignore the rest. That turned out to be the case here, and it was surprisingly straightforward. In less than an hour, I had a working console app fetching Hikvision events and "imaginarily" opening gates — since this functionality hadn’t yet been integrated into the main application.

Aight! Now I had just one week and five days to design and implement this into the main app, create models and migrations to store registered Hikvision faces, integrate with the automation system to open the gate barriers, and push it all into production.

The Three Tables

Off to the designing phase, things got a little overcomplicated. We didn’t have a testing database to work on, so every change had to be done in production. That’s right — full-on cowboy coding mode, where each tweak felt like rolling the dice on whether something critical would break. Migrating new tables without accidentally breaking sensitive data is the stuff of nightmares, trust me.

But as I started designing the system, it hit me: the local database was really the only one that needed changes — at least for now. Since the plan was to add face recognition for just one parking lot, it made sense to leave the cloud database untouched and focus exclusively on the local one. This approach felt much more contained, and let’s be real, it lowered the chances of me setting off a chain reaction of database disasters.

Every one of our tenants has up to four gate barriers, which means we could be dealing with up to four Hikvision devices per tenant (one for each gate). Here’s the kicker: every new face registered would need to be added to up to four different devices. Why? Because each device operates independently and has its own internal database. That’s right — yet another set of databases to manage.

This meant one thing: since multi-dimensional tensorial databases aren’t a thing (yet), and I was thoroughly fed up with wrangling JSON objects, I decided to keep things simple (well, simple-ish) by creating three tables.

First, there was a table for user information. This stored details like which parking lot they belonged to, a foreign key to another internal table that had informations such as if they were active users (don't want to open gate for non payers, am I right?), their Hikvision ID, and their profile picture — basically, the face they’d use to open the gate. Next up was a table for the devices, which held each device’s local IP address, its name, and the parking lot it was assigned to.

Finally, I added a junction table. This one was the MVP, as it created a many-to-many relationship between users and up to four Hikvision devices. With this setup, each user could be mapped to multiple devices without duplicating data across rows. It wasn’t tensorial, but hey, it worked — and it meant I could finally stop battling with those never-ending JSON trees and parsers.

Let's Go(lang)

Since our tenants manage their users through our online portal, it was clear that I’d need to add Hikvision registration functionality there. Here’s the problem: this realization came about five days before the delivery deadline. And let’s be honest, front-end shenanigans are not exactly my forte. So, what did I do? I came up with another brilliant idea (or desperate hack, depending on how you see it).

Given the tight timeline and the fact that this solution was only for one of our tenants, I thought: why not create a local server in Go that handles Hikvision registration exclusively? It was logical, elegant, and most importantly, simple. No extra complexity bleeding into our main systems, and no messy front-end integration headaches. Just a clean, focused local service that did one thing and did it well.

Golang was perfect for the job, largely thanks to its goroutines. These made it a breeze to handle not just the registration process but also communication with our .NET app. All it took were a few TCP sockets to get my Go server sending commands to the .NET app for registration tasks.

Now, here’s how it all came together: the Go server’s primary role was to handle feeding the three newly added tables on our local database with user data. It didn’t directly deal with adding users to the Hikvision devices. That responsibility fell to our main local app, which already had my digest header implementation in C# ready to go. This separation of responsibilities kept things clean and streamlined, while also leveraging the best tool for each part of the system. Golang for its concurrency and quick server setup, and .NET for the heavy lifting with Hikvision integration.

Testing

What is a developer without its testing suite? That's right! The BEST of developers!

With two days left until delivery, I found myself standing in the middle of the parking lot alongside the support team, ready to test whether this whole patchwork of a solution would actually work. And let me tell you, I was sweating buckets — partly because I was nervous and partly because parking lots are ridiculously hot. It’s one thing to debug in the comfort of your office; it’s another thing entirely to troubleshoot on-site with the sun beating down on you and the clock ticking ominously in the background.

I updated the desktop app with my new .NET release and installed my Go local server on their computer. The support team ensured all four Hikvision devices were correctly set up with accessible IP addresses and unique names. Everything was in place, so I crossed my fingers and fired up the Go server.

To my surprise (and immense relief), it worked perfectly! I could add users to the local database, and — almost miraculously — those users were also being registered on the Hikvision devices. This meant the TCP socket communication was solid, the .NET registration process was flawless, and the digest headers I’d painstakingly written were doing their job. It was one of those rare moments where everything came together as planned, and I couldn’t help but feel a mix of pride and disbelief.

But, as with everything in life, there was a catch. While the Hikvision devices were successfully integrated into this tech stack mayhem, one critical piece wasn’t working: the gate barriers refused to open upon face recognition. The devices recognized faces just fine, but the signal to open the gates? Nowhere to be found. It was like building a car that starts perfectly but refuses to drive — frustrating, to say the least.

Time was up for the day, so I packed up, went home, and found myself staring at the ceiling like it held the answers to life’s greatest mysteries. Sleep? Who needs it when you can replay every decision you’ve ever made, wondering which one doomed you? My brain decided it was the perfect time for an while (true) loop of “What went wrong?”

You really Shall Pass

The next day, there I was again. I had lunch with my support team, mustered my strength, and headed back to the parking lot. After a great night of sleep (or maybe not), I opened my laptop and saw it. Right there, staring me in the face, mocking me. The issue had been obvious the whole time: I forgot to call the open gate function after the face was recognized. How could I miss something so basic? I don’t know, but somehow, I managed it. Classic.

I wasted no time. I quickly implemented the missing call and drafted a new production-ready solution to test (I know, “production-ready for testing” sounds weird, but hey, software development is indeed a mar de flores). Then, I updated the desktop app yet again and went to one of the Hikvisions to test if it would finally open the gate.

And it worked! Well, sort of. The gate opened — just not the one I was standing at with my sweaty, hopeful face. Instead, it opened another gate entirely. Why? Why on earth was this happening? Of all the gates, why not my gate?

And, of course, the answer was obvious (again). The way the .NET app was processing the stream and deciding which gate to open was based on the Hikvision device name. I had written a simple parser that looked for, say, G01 and called the corresponding gate function. Logical, right? Well, it turned out the support team had swapped the device names during setup, meaning each device was happily opening a completely different gate from what it was supposed to.

After renaming all the devices to match their correct spots and finally figuring out which gate was numbered 1, 2, 3, and 4, we went out to test again. And this time? It actually worked! My face opened the gate! I almost had an out-of-body experience. It was glorious. But, as with all fleeting moments of triumph, reality hit me right in the face again. Since the gate-opening function was running in a thread on the main app, and I hadn’t implemented any thread safety with mutexes or locks, triggering one gate effectively locked out the others.

To confirm my hunch, I restarted everything and tested on a different gate. The result was always the same: once one gate was triggered, the other three went on strike, refusing to open no matter what. It was as if they’d unionized against me.

Thread lightly... Or safely!

So there I was, couple hours left to the next day, the grand delivery date, and my solution was not ready because of not thread safety. I opened up my computer back again under the frying sun and cooked up a 15 minutes code fix to that.

I added a lock to each thread and implemented a restart mechanism for when the program decided to crash or refused to connect to the other gates. This way, not only was I finally thread safe, but I was also ensuring that any disconnection triggered an automatic reconnection attempt.

Of course, I wasn’t keen on turning the solution into a time bomb of stack overflows with endless restart loops. So, I built in a little fail-safe: each time a restart was triggered, the delay before the next attempt would gradually increase — but only up to a set maximum. A little thread safety, a pinch of crash resilience, and a dash of logic to avoid infinite recursion. Voilà! A solution, duct-taped and ready for action.

I did it...

Well, just let me tell you this: I never felt my face was so beautiful up to the moment I got in and out of the parking lot using it.

Share:
0 Comments
Comments Reply