Affordable Computing Architecture for Deep CNNs (1)

At Odd Concepts, we do a lot of computer vision-related work. Our latest product is a fashion search engine - ShuqStyle. We ship this product to many B2B customers, and it’s been getting incredible feedback.

ShuqStyle Demo UI

Our demo page. Duplicate entries are due to different vendors selling the same product with the exact same image, and not a bug.

Under the hood, we use multiple GPU-powered deep convolutional neural networks to power the search core. Deep CNNs have recently become one of the most groundbreaking fundamental developments in computer vision, and in many cases they provide significant improvements in meaningful feature extraction over traditional computer vision approaches.

The single biggest downside of CNNs from a business perspective is the total cost of ownership - training and refining a model requires people in your team to have access to powerful and expensive GPUs, in most cases a Titan X or Tesla. For any startup, this will leave a noticeable dent in your equipment budget. (And to add insult to injury, if Titan X is a screwdriver, the power drill equivalent, DGX-1, costs an order of magnitude more.)

We are a company selling products and services powered by computer vision to other businesses. Our main business model is making computer vision products accessible to anyone, regardless of expertise in the field. To be competitive in the market we strive to make our offerings as affordable as possible without sacrificing power. In this article, we explain some things we did to make our product more affordable for our customers.

Our new ShuqStyle engine builds on top of a distributed computing architecture running on multiple nodes, and we also have GPU-less nodes for modules that do not need a GPU at all. Within the infrastructure, we have two types of nodes: buckets and sentinels. This may not make sense from the wording alone - we are getting to that.

Our product is offered mostly to business customers with a stable user base. In a world where growth hacking is a thing, this may look like a bad thing - but it’s not. Our customers have predictable, organic active users, and the deviance between a busy day and a lighter day is more or less predictable. Hence, we have a rough idea of what we loads we are expecting. This lets us avoid placing bandwidth or API call restrictions on our customers aside from their object store quota. In a nutshell, our policy is "within reason, we don't care".

If our customers have a spike in users, we won't have significant issues. For initial launch, our infrastructure has been carefully designed to handle in the ballpark of 1 billion hits over the span of two hours. We will obviously grow this as needed, and as we get more paying customers on board.

For new customers who require a scale up in hardware, we can prepare the hardware while the legal documents are being dealt with. We are based in the heart of Seoul, so we have the luxury of accessing quick delivery from Yongsan electronics market. A made-to-order rackmount server from consumer grade parts usually has a delivery time of less than 32 hours. Lawyers are slow, but the Yongsan electronic market isn't - and you can even negotiate for faster delivery, just be prepared to pay extra.

We call this "Yongsan scaling". It's an advantage we have by virtue of being in a highly populated city with reasonably low labor costs - while we have never bothered to investigate, Silicon Valley and NYC most likely do not have this luxury.

Many ask this question: "why not a cloud provider?"

No, we are not copying Google's first datacenter model.

Cloud providers are great. They are probably the next best thing since bacon to happen to infrastructure engineering, but that comes at a cost. Especially in a country where bare metal hosting is quite often as cheap as EC2, pocketing margins for our work on top of already expensive EC2 simply does not fly. (Also, the disadvantage of being in Seoul is that real bacon is expensive, but we'll save that story for another day.)

Standard issue four GPU node

In short words, we use bare metal because we are cheap. None of the cloud providers have GPU hosting offerings which let us provide our software at a low enough price, especially in a market where competitors are willing to sell at a loss by burning VC investment money. Additionally, cloud GPUs are less predictable in terms of performance compared to bare metal.

EC2's g2.8xlarge costs 1713.33USD per month. That’s roughly 1.98M KRW per month, which is about the same price we pay for one of our standard four GPU nodes in hardware.

If you could buy a server with four GPUs every month for less than the cost of running one on EC2 for a month, what would you do? Even considering labor costs for maintenance, one year of operating on EC2 leaves you no tangible assets left aside from a thank you mug from Amazon, but running metal leaves you with 48 GPUs in your infrastructure. Now think of how many dogecoins you can mine with that. (The answer is, not that many - but you get the idea.)

Example of poor case matching

Not always does everything work as planned. This "bug" has been fixed in the new standard node specification.

In the next article, we'll dive into what each hardware node and component in ShuqStyle does.