{recursion} because culture is recursive. by DeltaX Engineering Team
DeltaX Website   |   DeltaX Careers

An EF memory optimization story

Optmizing Entity Framework memory usage to bring down service memory usage from 90% (> 8GB) to negligible. Most of the core services needed meaty VMs that had more than 8GB RAM. And these services still under-performed.

Day 0 - July 3rd, 2018

The story started when I saw this ping on the Hyper-V channel.

[03/07/2018 1:11 PM] Ketan Jawahire: When service takes more than 90% of VMs memory then it becomes really difficult to work with that VMs.
PS takes minutes even to single simple statement. 😫  

[03/07/2018 1:29 PM] Suneel P: changed all Ad Data Download VMs memory config to Azure config

[03/07/2018 1:29 PM] Ketan Jawahire: I am going to install it on some azure VM & check for log files. Its really difficult to use PS with VMs using 90%+ memory

[03/07/2018 1:30 PM] Suneel P: try using Ad data download vms - 23,28,32,38. they have enough ram now

Here’s our story of a clever hack to reduce the memory footprint of EF’s cache objects (mapper objects, etc.).

Radar - June 2018 Digest

"Radar" is our monthly digest which features links that our engineering team found interesting

Heimdall goes Headless with Chrome

Headless Chrome started shipping with Google Chrome from version 59 onwards. It brings all modern web platform features provided by Chromium and the Blink rendering engine to the command line; hence opens up quite a few possibilities. This post discusses how we use it as part of our service monitoring bot - Heimdall

Radar - May 2018 Digest

"Radar" is our monthly digest which features links that our engineering team found interesting

Hello Google Apps Script

The story of how I hacked together a simple hiring portal.

Git Workflow at DeltaX

Moving to git hasn't been easy. We, at DeltaX, moved from SVN source control over 6 months ago now. I write this to give an overview/early reaction on how it has been working for us.

Querying Hundreds of GBs of JSON data with Amazon Athena

At DeltaX we have been using Amazon Athena as part of our data pipeline for running ad-hoc queries and analytic workloads on logs collected through our tracking and ad-serving system. Amazon Athena responds anywhere from few seconds to minutes for data than runs into hundreds of GBs and has pleasantly surprised us by its ease of use. As part of this blog post, I shall discuss how we went about setting up Athena to query our JSON data.

What we like about Amazon's Elasticsearch on AWS?

AWS Elasticsearch has its woes which are widely publicized; this blog post discusses reasons why we use it and probably why you should also consider

Using Beacon API for Tracking Pixels

Tracking pixels also referred to as 1x1 pixels is a common way to track user activity in the analytics and adserving world. Overall, the tracking pixels are flaky and costrained by the limitations imposed by various browser environments and network connectivity. The Beacon API proposes to address these concerns and to provide a streamlined API and predictable support across browsers.

Building a Real-time Stream Processing Pipeline

The Big Data ecosystem has grown leaps and bounds in the last 5 years. It would be fair to say that in the last two years the noise and hype around it have matured as well. At DeltaX, we have been keenly following and experimenting with some of these technologies. Here is a blog post on how we built our real-time stream processing pipeline and all it's moving parts.

An Introduction to Elasticsearch

A very detailed introduction to elasticsearch; covering all its important aspects. It includes details about clusters, nodes, shards, indexes, inverted-indexes and segments.

Inside our new Identity

Redesigns are not easy. With over 5 years of emotions attached to our old identity, we knew this was never going to be an easy task. I must admit we tried undertaking this endeavor a year back and failed to see it through. We had our fair share of learnings from that experience. What we were certain of is that a lot has changed in the last few years - within ecosystem, for us as a company, and for the partners that we work with. Our identity needs to reflect this and at the same time inspire us to lead the change.

HTTP/2 is here

[29/09/17, 1:32:45 AM] Amrith Yerramilli: (*) (*) (*) !!!
https://production1.adbox.pro/App/AdServer/Ads/New/1501

Holy f***! what a difference
[29/09/17, 1:32:53 AM] Amrith Yerramilli: HTTP2 !!
[29/09/17, 1:33:01 AM] Amrith Yerramilli: I’m literally screaming

My conversation with Akshay a couple of weeks ago.

Life In Recursion

Looking into providing solution to bigger problems by tackling smaller instances of the same problem

CDN for serving Dynamic Content

Using CDNs (Content Delivery Network) for static content has been a long known best practice and something we have been using across our platform and ad-server. I wanted to share a special usecase where we use CDN (AWS Cloudfront) for serving dynamic requests on our ad-server to achieve subsecond response times.

Hello Vue

A simple introduction to VueJs

The Multifaceted Redis

At DeltaX, we have multiple use-cases of Redis.

  • As a Session Store
  • As an intermediate entity lookup cache store
  • As a key store to identify most used business profiles

This article talks gives a little background on how we latched onto Redis, some gotchas, and some free advice :)

Knockout.js

As a web developer, we often have to work with different JavaScript frameworks on a regular basis. In this article I will briefly explain my experience with KnockoutJS library and then talk over specifics around some of the components of KO : Observables, Dependent Observables and Templates.

Predict Ad Clicks

Can we create a model that could predict whether an ad would get clicked or not ?

Internship Experience at DeltaX

Today happens to be my last day of the internship - and so, is a good time to pause and ponder of the weeks that flew by. As part of this blog post, I plan to share what I worked on, learnings and my overall experience.

1ppm Challenge - GTD Hack

The idea is to pick a project, get something done and have a presentable result within one month. The project could be anything: a software project, a hardware project, a book review, an article, ...As long as there is an outcome and there is some learning involved

IFrame Communications

There are times when we have multiple iframes on a page and all of them need to access/modify some data/information on the parent page, while also needing the change to happen as soon as the page load without any delay, for an ad serving platform being able to achieve this is one of the primary requirements.

A Primer on Angular 2

I have been trying my hand at angular 2 applications as part of my learning and thought of sharing my thoughts on my initial experience.

A Layman’s guide to Attribution Models

What is attribution and attribution modelling? What is the difference between single-touch and multi-touch models? Which model you should use?

Functional Programming in C# 101

Functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. It is a declarative programming paradigm, which means programming is done with expressions or declarations instead of statements

A War Biopic - Migrating our Multi-tenant app to Azure

In the second half of 2016 - we decided to migrate our multi-tenant app from bare-metal servers to Azure. While you can find numerous benchmarks for various cloud platforms - there are very few relatable drill downs on the thought process as part of such migrations to the cloud as is. More importantly this was not just a migration - it was literally a war with all hands on the deck; keeping the existing usage, client data and growth intact we were able to migration over 1.4TB data and existing clients to the cloud successfully. I thought this story needed to be told and so I did.

Getting addicted to the REPL

There. I’ve said it - I am addicted to the REPL.

Looking back, it has been one of the best learning tool for me - today I know it’s called the REPL

Node.Js - The Asynchronous Approach

Asynchronous programming is the approach of creating code that could execute in parallel and does not have to wait for an action to complete before moving on to the next action, The ideology behind this is to have actions that are independent of other actions and hence executing them in sequence would be a waste of precious time. Keeping this programming paradigm in mind, Node.Js was created.

Multi Channel Attribution

In Digital Advertising attribution is the problem of assigning credit to one or more advertisements for driving the user to the desirable actions such as making a purchase. This post discusses how one can model this process and its impact on budget allocation.

Learnings from building High Availability(HA) Services

When designing architecture for mission critical systems the two most commonly discussed aspects are scalability and availability. Most often than not both aspects are used interchangeably. Scalability is about being able to handle increasing load while availability is keeping the system operational by decreasing downtime. Designing Highly Available systems is focusing on the qualitative measures to reduce downtime and eliminating the single point of failures (SPOFs). Here are some learning and thoughts on things to consider while architecting an HA system.

Is the Future of Application Architecture - Serverless?

Advancements by cloud-based IAAS providers (Amazon Web Services, Google Cloud and Azure have made on-demand scale and flexibility a reality. Today, as a startup you don't need to worry about over-provisioning resources, forecasting growth in infrastructure and go over long-term infrastructure contracts to meet your demands. Interestingly, a new suite of cloud services are challenging the core aspect of common application architectures - the 'server' and are termed as `serverless`.

Video Transcoding on the AWS Cloud

Transcoding is the process of converting a media file from one format, resolution, quality and specs to another. In the past, a transcoding pipeline would require a lot of heavy lifting on the software and hardware front. Today, using the cloud you can setup a transcoding pipeline in a matter of minutes.

How long does it take to respond ?

Can we calculate how long does it take for one to respond ? _**TL;DR**_ : In short yes, we can use probability theory to quantify our response times.

Front End Optimization 101

Every now and then, we realize we need to go back to the basics of building websites (read as web apps).

Yes, FEO is a real term - Front End Optimization.

TL;DR : We’ll take a quick look at how we moved to using CDNs to serve our static assets. Oh yeah - it is that simple.

Feedback control for processes

In this post we want to discuss about feedback control and how we can use this concept in practice to stabilize the system. We will take a look at a case study and see how we we can use simple techniques to control the metric of interest.

Understanding SQL Indexes

If you believe in 'Manners maketh Man'; then you would agree when I say 'Indexes maketh the Query in SQL'. In this post, I plan discuss the most basic types of indexes available - clustered and nonclustered. I also plan to show with examples on how they work, individually and together and compare each case to the real world.

Optimal Bidding Strategies for Keyword Auctions

In the earlier blog post we discussed about sponsored search marketing and the mechanics of auction design. In this post we shall look deeper into the challenges of online keyword advertising auctions among multiple bidders with limited budgets, and try to come up with the bidding strategy that will increase the expected utility of the advertiser.

Sponsored Search Market and Auction Mechanism Design

Setting prices for a sealed auction by the search engine for different queries is pretty complicated. One possibility is simply to post prices, the way that products in a store are sold. But with so many possible keywords and combinations of keywords, each appealing to a relatively small number of potential advertisers, it would essentially be hopeless for the search engine to maintain reasonable prices for each query in the face of changing demand from advertisers. Instead, search engines determine prices using an auction procedure, in which they solicit bids from the advertisers. There are multiple slots for displaying ads, and some are more valuable than others. As part of this post we will discuss the various dynamics for Auction Mechanism Design for Sponsored Search.

Node library to communicate between processes

Since node.js applications are single threaded we can create multiple processes to listen to the same port, and make use of all the CPU cores that are available in our computers. To simplify this, node comes with a module called cluster. This module has a number of handy functions to create workers and monitor them. It also lets us send and receive messages from processes (similar to Window.postMessage() in the browser).

One of the limitations of this module is that a worker is not aware of any other worker and can only send messages to the master process. If one worker has to send a message to another worker, the message must first be sent to the master which then forwards it to the actual recipient.

This results in a lot of boilerplate code that’s repeated in a number of applications. Pseudo code to broadcast a message:

// Workers send the message to the master
function sendMessage(message) {
    process.send(message)
}

process.on('message', function(message) {
    if (process.isMaster) {
        // In master; forward message to all workers
        forEach (worker in workersList) {
            worker.sendMessage(message)
        }
    }
    else {
        // In worker
        switch(message.code) {
            case 'code1': function1(message); break;
            case 'code2': function2(message); break;
        }
    }
})

You’ll have to write additional code to keep track of the sender’s process id, send replies to just one worker and such stuff which are pretty easy to do, but should really be a part of the library itself. So I went ahead and made a library that acts as a wrapper around the underlying cluster module and helps reduce boilerplate. :p The library provides functions called messageSiblings, messageWorkers etc. which do exactly what their names say.

The library can be found on npm under the MIT license. Feel free to fork it and add any functions you need. :)

Building a user scriptable decision engine in Node.js

As part of the revamp of the DCO engine, we have been adding support for quite a few highly customizable scenarios for dynamic creatives - including storyboarding, geo-location, weather etc. If you haven’t seen them yet then you should give them a dekho - story-board demo, e-commerce demo and geo-loaction demo.

Considering we allow highly customized dynamic creatives, one of the challenges we faced was the trade-off between making our algorithms generic enough vs. flexibility to accommodate advertiser use cases.

Let’s a take an example for an Advertiser ‘A1’ who wants to show different creatives based on the current weather - sunny, cloudy, snow or rain. Another advertiser ‘A2’ who would want to show different creatives based on the current temperature - cold, pleasant or hot. Here is how these algorithms would look like in pseudo-code.

A Race Of Threads

The unexpected side effects of converting a single threaded service into a multi-thread, multi-instance service.

We’re in the middle of one of the most critical migrations - moving to the cloud. One of the most frequently used terms about this shift is scale : the ability to run mutiple instances of something, without worrying about the operational overheads.

During this migration, we are looking at ways of parallelizing pretty much every background service. One such service is our External Clicks worker. Well since we were in a hurry and we needed to migrate ~500GB of data to the new servers, we decided to run multiple instances of this worker.

All was well. Well, almost.

Conquering callbacks in node.js

TL; DR: Use fibers to escape from callback hell!

As someone whose only experience with asynchronous functions was jQuery’s $.ajax I was in for a treat when I started working on a project using node.js. Functions that took functions as arguments? “That’s okay,” I told myself, “I’ve used functions like map and filter.” Little did I know what a mess this could quickly turn into!

Constrained Optimization using Bayesian Bandit Algorithm

Finding the optimal distribution of budget among different ad groups so that one can optimize their business objective is a challenging and time-consuming task. One will have to constantly monitor changes and make decisions accordingly or can leave it to an intelligent system which looks at multiple features, trends for the data points and then comes up with suggestions.

Experimenting with Go

We have been using Node.js rather successfully on the DeltaX Tracking and Ad-serving side of things with it churning >10K requests/min without breaking a sweat. The inherent asynchronous event-driven nature of Node.js helps us keep the latencies low and the memory footprint small. 1

In recent times, Go (aka. golang) has come close to challenging Node.js with regards to building light-weight high-performance micro-services and also brings own set of advantages to the table. 2

Hello World

This day, four years ago is when this journey began. I must admit that it took longer than expected to start this blog but happy that we have finally taken the plunge.

It’s pretty easy to underestimate the power of small learnings while you are trying to make things work and get things rolling. Hoping this blog acts as a journal celebrating small learnigns and big achievements.

So, here we are starting with the customary hello world.