# Dealing with Memory Leaks from Anonymous Classes in Android

As anyone who has programmed in C can tell you, memory management can be a challenging task. In Java it is significantly easier- so much easier that it lulls many developers into a false sense of security. You can still leak memory in Java, and leaking memory leads to crashes and bugs that will sink your app.

In this post, I’m going to be looking specifically at memory leaks caused by anonymous classes. Anonymous classes are useful in android programming as a way to avoid writing extra boilerplate. To quote the Oracle docs, they “enable you to make your code more concise. They enable you to declare and instantiate a class at the same time. They are like local classes except that they do not have a name. Use them if you need to use a local class only once.”

This is all well and good until they last longer than the context around them. Consider the following, rather benign looking code:

This code may leak; If the activity is destroyed before the thread is finished executing, the activity cannot be garbage collected. See the following LeakCanary leak report:

Creating a thread in this way makes it an anonymous inner class, and thus has access to the outer class’s variables. To maintain this access, it needs a reference to the outer class, or in this case the activity. Thus, if thread is still active when the activity would normally be removed from memory, the inner class tells the garbage collector to stop and prevents the activity from ever being cleaned up. This is particularly problematic because activities tend to take up a large amount of memory with things such as view hierarchies.

Other asynchronous tasks are also culprits of leaks for similar reasons; Handlers, AsyncTasks and more keep references to the outer class that spawned it after it has been through the end of its lifecycle, preventing it from ever being garbage collected.

So how do we use these classes without hosing our memory usage and causing an eventual OutOfMemoryError? First, we have to de-anonymize them and make them static. This is the same code, but using static inner classes instead:

As you might have guessed from reading the class name, this still leaks:

Our inner class needs to hold references to the activity and the textview so it can change the text as needed, but holding these references causes the activity to leak if it’s destroyed while the thread is still running and holding its references. To fix this, we need to use weak references. Weak References are references that we tell the garbage collector we’re not particularly worried about holding on to. If an object is being garbage collected and is only referenced weakly, we let it go. For our purposes this is just what we want; Instead of passing the context as a strong reference to the inner class, we pass it as a weak reference. Then, when we go to use it within the inner class, we simply check if it’s null (and thus was garbage collected) before we use it. In this way, we allow the activity to be released if it hits the destruction part of its lifecycle before our inner class has been executed and cleaned up. This is the same code as earlier, but without memory leaks:

Now our app can happily move on without clogging memory with old, unused references, and be sure that the things it wants to forget remain forgotten. While learning about this, these two blogs were invaluable resources; I highly suggest you check them out if you’re interested in learning more:

# Area Under a Snowflake

Ever want to calculate the area under a curve in SQL? I know, I know.. who hasn’t right?

It’s pretty straightforward to approximate the area under a curve in SQL by computing a Riemann sum using the trapezoidal rule.

First, we’ll make up some coordinates that represent the curve we want to integrate. Let’s find the area under the square root function from 0 to 1. Just because I want to use

$\LaTeX$ in this post (even though I can’t figure out how to make it display inline), we are approximating this integral:

$\displaystyle \int_0^1\! \sqrt x \, \mathrm{d}x$. Boom.

Anyway, back to getting the coordinates, we will use 101 evenly spaced points on the curve:

WITH coordinates AS
(
SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x,
SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y
FROM TABLE(GENERATOR(rowCount => 101))
),

Next, we compute the width of the bottom side of the trapezoid and the difference in height between the left and right sides.

deltas AS
(
SELECT x,
y,
LEAD(x) OVER (ORDER BY x) - x AS delta_x,
LEAD(y) OVER (ORDER BY x) - y AS delta_y
FROM coordinates
),

Now we can compute the area of each trapezoid in the Riemann sum.

partial_areas AS
(
SELECT *,
(delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area
FROM deltas
)

Here’s a look at what we have so far:

  SELECT *
FROM deltas
X Y DELTA_X DELTA_Y PARTIAL_AREA
0.000 0 0.010 0.1 0.0005
0.010 0.1 0.010 0.04142135624 0.001207106781
0.020 0.1414213562 0.010 0.03178372452 0.001573132185
0.030 0.1732050808 0.010 0.02679491924 0.001866025404
0.980 0.9899494937 0.010 0.005037943445 0.009924684654
0.990 0.9949874371 0.010 0.005012562893 0.009974937186
1.000 1 NULL NULL NULL

To get the final answer, we can just sum the area of all the small trapezoids.

  SELECT SUM(partial_area) AS area
FROM partial_areas

If we evaluate the integral above, we get exactly 2/3. With this method, we get an answer of 0.6664629471, which is pretty close, close enough for what I’m using it for, and may or not be close enough for your needs.

Here’s all the SQL together:

WITH coordinates AS
(
SELECT (ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100 AS x,
SQRT((ROW_NUMBER() OVER (ORDER BY 1) - 1) / 100) AS y
FROM TABLE(GENERATOR(rowCount => 101))
),
deltas AS
(
SELECT x,
y,
LEAD(x) OVER (ORDER BY x) - x AS delta_x,
LEAD(y) OVER (ORDER BY x) - y AS delta_y
FROM coordinates
),
partial_areas AS
(
SELECT *,
(delta_x * y) + (0.5 * delta_x * delta_y) AS partial_area
FROM deltas
)
SELECT SUM(partial_area) AS area
FROM partial_areas
;

# A Stats Reporting WSGI Middleware

At Jana, we like to measure things. From high level business KPIs to low level operational metrics, our goal is to instrument it all.

One of the most important things for us to measure is API performance. When your architecture decomposes frontend API calls into more granular calls to other APIs, the downstream implications of a slow service call can be enormous: everything from a slow user experience waiting for operations to complete to failed recharge transactions due to service timeouts.

At a recent hackathon, I decided it’d be fun to measure our API performance by capturing how long a request takes to service once it has reached our python HTTP servers (that is, excluding any frontend load balancing, intermediate proxies, etc). Since we already had Graphite setup in our infrastructure, statsd seemed like a natural fit for this. Once we had that setup, the question became: how do we instrument our API code to measure timings in a minimally intrusive fashion but with maximum coverage?

This is where a WSGI middleware can help. Briefly: a middleware lets you “play both sides” of a request; that is, the server and the app. With a middleware, we’re able to intercept calls incoming to our flask app, start a timer, dispatch the request and emit data to statsd once the request has been completed.

However, one non-obvious wrinkle for those new to WSGI is buried in PEP 3333:

When called by the server, the application object must return an iterable yielding zero or more bytestrings.

At first blush, you may be tempted to write code like the following handwave-y implementation to perform the timing:

def __call__(environ, start_response):
start = time.time()
result = wrapped_app(environ, start_response)
end = time.time()
emit_timing_data(end - start)

The issue is that an underlying WSGI-compliant app is permitted to return an iterable — this can be a concrete list of bytestrings, or it can be a generator (if the app wants to stream data back to a client, for example). Additionally, the PEP states:

If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request

Since we are “playing both sides” we should be sure to call close() on the iterable if it is present. Here is a better solution:

def __call__(environ, start_response):
start = time.time()
iterable = wrapped_app(environ, start_response)
for result in iterable:
yield result
if hasattr(iterable, 'close') and callable(iterable.close):
iterable.close()
emit_timing_data(end - start)

A complete middleware implementation can be found here.

# As a mobile-app publisher, how to optimize attribution and get more revenue

As a publisher, do you want to understand why you’re not getting credit for all the installs you’re driving for your advertisers? At Jana, we’ve been experimenting with ways to optimize attribution. Optimizing attribution ensures that you’re not wasting impressions for users that won’t generate revenue. In this post, we’ll give you simple tips on how to maximize your attribution.

In mobile-app advertising, it is standard that the publisher that generated the last click before an install gets credit. By getting credit, we mean that, according to the attribution partner, you can charge your advertiser for the user that installed the app. When crediting the publisher, the attribution partner sends a postback, or more specifically, an HTTP POST (basically a message) stating what you need to know about the user. If you’re not credited, you won’t receive anything. Generally, information about events not attributed to you is not available. This is frustrating, especially if you want to obtain information why you didn’t.

We’ve learned through trial and error data mining and conversations with advertisers that, likely, you likely won’t get credited if a user has installed the advertised app more than once. The idea is that advertisers want to pay for new users, so filtering “multiple installs” makes sense in this perspective. The definition of a user is tricky and differs per attribution partner. Here are the main IDs the attribution partner may use to define a user:

• The device IMEI, a unique ID assigned to a device by the telecom network.
• The Android ID, a software-generated ID assigned by the Android OS (deprecated in favor of the Google advertising ID).
• The Google advertising ID, a software-generated ID assigned by the Android OS

There are many other ones that exists depending on the phone network and manufacturer. Make sure you understand those.

You can maximize attribution if you keep track of the history of which “IDs” are associated to your installs to make sure that user with a given “ID” only install an app once. Certain attribution partners can also help you track those. Note that all of those are easily changed by malicious users. So beware. In emerging markets, such as in Indonesia it is common for manufacturers to recycle IMEIs.

You may also notice that you may not get credited as much for users with rooted devices. However, we don’t recommend blocking those in emerging markets since these are common among good users, especially in China. This is a cost you may have to absorb.

Fraud, Fraud, Fraud

App-install fraud is a problem in the industry. At Jana, we have a team of specialists who put measures in place to prevent fraud and ensure only humans download the apps we advertise. Attribution partners attempt at discreditin installs that appear to be coming from bots. We use here the verb “appear” since this is probabilistic in nature. Obvious counter-measures are to block installs with software that detects duplicate IPs, known malicious servers, fraud patterns, and so on. At Jana, we heavily invest in fraud protection in order to provide the most ROI to our advertisers and at the same time optimize our attribution.

Fraud is a tricky problem to solve. For example, users in emerging markets may be sharing Wifi networks, with similar IP addresses, and may be using VPNs in order to access content from abroad. Because of this, you will be discredited for real installs from real users.

At Jana, we solve fun problems every day in order to provide free Internet to the next billion. If you’re interested to work for us, contact us here. If you’re interested in becoming an advertiser, just fill in the form here.

# How to better facilitate engineers to develop an app for a market they are not familiar with?

There have been a few great posts, such as Handling your product manager and How to lead engineers through domination and fear, that share best practices of how engineers and product managers should work together. As the product manager of China team, whose goal is to build products that users in this market really love, I noticed that it is critical for product managers not only to listen to engineers and provide context of proposed features, but also to create opportunities for engineers to learn more about behaviors and thoughts of users in such market that the whole team may have no idea at all. In the following I’d like to share another three tips that I found useful to enable engineers to develop an app for a market that they are not familiar with :

First of all, invite engineers to listen to or conduct user interviews. The biggest challenge of developing apps for a new market is that the team may not understand the users well enough. Many assumptions we took for granted does not hold for users in different markets. User interviews can help the whole team, including product managers and engineers, to engage with potential users, understand the pain point and the thoughts when users are using our apps.

Secondly, provide engineers a smartphone that is popular in China. Also, make sure the phone is manufactured in China, have a local sim card, configured similarly to those of most local users and also have some popular apps installed. This can help engineers better understand local users and the context of how our apps will be used and observed. Also, due to differences in default settings and app behaviors, many actions that works on smartphones in US do not necessarily work on local smartphones.

Last but not the least, schedule in-market trip for engineers, if possible. There are at least two benefits. FIrst of all, it is easier for engineers to fix issues that are hard to reproduce in US, by observing users behavior in person and debug on the spot. Secondly, engineers can work on integrations with local partners more effectively.

Are you interested in developing an app that can serve billion people in emerging markets? We are hiring

# Drinking from the Firehose: Communication as your company grows

Here at Jana, we are growing fast. In 2016, we added ~20 new positions and moved into a new office to accommodate the growth. When you have just a handful of people, it’s easy to make sure everyone knows what they need to know. Cc-ing the entire company is normal. Making decisions in the hallway is fast and effective. However, when you start approaching 100…

At a startup, it’s easy to think that managing information won’t be a problem for a long, long time. Thinking about communication processes is for big corporations! In real life, however, communication required grows much faster than the number of people, because the number of discussions per person grows as well as the number of people as your business gets more complex. You can end up with lots of meetings, huge meetings that are unproductive, key people getting left out of decisions, and a general sense of needing to know everything all the time. All of these things are bad.

One way we’ve started to handle this at Jana is to move from a pull system to a push system. Previously, it was expected that people would read the appropriate slack channels and talk with the right people often enough to be well apprised of what was going on. With more and more information out there, however, people can feel crunched under the workload of constantly consuming knowledge. We’ve increasingly switched to the “push” system, where the people who are putting information out there have the responsibility to push it to the right people. This has helped us reduce frenetic checking of slack and email.

Another way to improve is to drop the expectation that everyone knows everything. As a company grows, it’s normal for people to not know every nitty gritty detail of what is going on elsewhere in the company. The important thing is to make sure the right people who do know the nitty gritty details are looped in when a decision or change is being considered. Again, it’s the responsibility of the people making the decision to loop in the correct parties.

Finally, it’s important to have distinct owners – people who are in charge of certain elements or results – to avoid requiring massive buy-in when decisions get made. Making sure smart decisions are made comes down to hiring smart and capable people, not asking for a 90 person sign-off on each tiny decision. Otherwise, it’s impossible to grow while staying agile.

It’s easy to think that information management is something to deal with later, or something that’s not for startups. That’s a huge mistake. Bad management of company information can lead to lots of unproductive time, and more importantly, unhappy and stressed colleagues. So, think about information management early! That way, your startup can grow from small and agile to large and agile while keeping everyone informed and happy.