Data Analysis Portfolio

Xizi (Allen) Huang  Β·  Senior Data Analyst

4 years at Garena Free Fire (~100M DAU) working across trust & safety, network experience, gameplay balance, and monetization in 15 international markets. Each project here shows not what I analyzed, but what I saw that changed the conclusion.

0
Daily Active Users
0
Years of Experience
0
International Markets
My Approach

How I Think About Analysis

Each project follows the same internal logic: locate the real signal, question whether the measurement is trustworthy, then turn the finding into something the team can act on and reuse.

01
Detect the Signal

Start from anomaly detection and progressively localize the issue across dimensions that actually matter β€” before jumping to solutions.

02
Question the Measurement

Most wrong decisions come from stopping too early β€” before asking whether the data is actually measuring what it claims to measure.

03
Build Something Reusable

Every non-trivial finding becomes a decision rule, a monitoring threshold, or a documented framework β€” so the team knows where to look next time.

Projects

Four Case Studies

Click any card to read the analysis.

Context

Operations wanted aggressive bans to clean up voice toxicity. Product pushed back β€” many flagged players were high-activity, high-value users. The two teams were stuck debating enforcement intensity before anyone had quantified the actual harm.

I stepped in to reframe the question: before deciding how hard to penalize, we needed to know whether our victim-side measurement was even capturing the right population. If the data pipeline was structurally incomplete, any causal estimate would be wrong from the start.

The Pivot

Initial causal matching showed almost no retention difference between exposed and non-exposed victims. The team was ready to interpret this as "toxicity doesn't cause churn." I flagged that this was a measurement artifact, not a true finding. Only the most resilient players stay long enough to file a report β€” users who quit immediately after a toxic encounter never appear in the reportable dataset. "No effect" was evidence the data pipeline was broken, not that the problem was minor.

This reframing shifted the entire enforcement debate: the question was no longer "how many bans?" but "how do we reach users who never report?"

What Changed

I worked with product and engineering to design a graded penalty model based on the corrected measurement logic. High-confidence AI detections triggered automatic action β€” reaching users who would never appear in any report. Lower-confidence cases still required user reporting as a safeguard against false positives from dialect or in-game banter.

The durable output was institutional: we established that passive reporting systematically undercounts harm among new users, and built that corrective logic into the governance design. The same principle now applies to any future trust-and-safety review on the platform.

Penalty Accuracy
99%+
Recidivism
βˆ’0.5 pp
New User D7 Retention
+1–2%
Context

After server consolidation, back-end monitoring showed faster matchmaking β€” a surface success. The engineering team was ready to call it a win. But community complaints about lag were rising at the same time. Two contradictory signals in the same release window meant at least one measurement was wrong.

I challenged the dashboard read: matchmaking speed measures entry, not experience. Optimizing the wrong metric had produced a misleading positive result that was masking real player harm.

The Pivot

End-to-end diagnostic from the routing logs revealed that 80%+ of affected users were being sent to distant data centers before the fallback threshold was reached. Root cause: an IDC capacity threshold set too conservatively, triggering fallback logic prematurely even when local servers had capacity. I presented this to LiveOps and engineering, and a configuration fix was deployed the same day. Abnormal cross-region matches dropped from 90% to under 5% and overall latency recovered.

What Changed

After the routing fix, a long-tail segment still had poor performance even on local servers. Further breakdown identified the new bottleneck: severe packet loss under weak-network conditions.

I partnered with product to design a dual-channel network feature and structured an A/B test to evaluate both benefit and user tolerance. Serious packet loss dropped 30%+; feature rejection rate stayed well within acceptable bounds. The test gave us the evidence needed to move from a stalled pilot to a confident global rollout.

Cross-region match rate was added as a standing signal in monitoring β€” so the same degradation surfaces earlier in any future consolidation.

Users Mis-routed (pre-fix)
80%+
Cross-region Matches
90% β†’ <5%
Severe Packet Loss
βˆ’30%+
Context

The design team believed AUG was balanced based on aggregate KD stats. Community pressure said otherwise. These two positions were both right in their own frame β€” aggregate metrics are confounded by player-skill selection bias, so they can't settle a balance dispute.

For the Wukong Skill Skin launch, the risk was different: monetization and fairness were in direct tension. The question was how to set pre-launch guardrails that let us capture revenue without crossing into pay-to-win territory β€” defined in advance, not evaluated after the fact.

The Pivot

For AUG: I proposed a same-player controlled comparison β€” comparing each individual's KD with AUG versus a neutral AR baseline β€” to remove the skill confound entirely. AUG outperformed the baseline by ~28%, well above the threshold where competitive integrity breaks down. That number moved the design team from "community perception" to an evidence-backed balance fix.

For Wukong: I set the fairness constraint before launch β€” a hard ceiling on per-player KD uplift relative to the current meta top tier. Monetization success that violates that ceiling is not a win. The constraint was agreed with product upfront, not negotiated after results came in.

What Changed

These two cases produced a reusable pre-launch evaluation template: quantify gameplay impact via controlled comparison, define acceptable outcome ranges before release, and treat fairness guardrails as a launch gate rather than a post-hoc audit. Design and revenue teams now use this template as a standard checkpoint before any new weapon or monetized skill goes to QA.

AUG KD Gap (same-player)
28%
Wukong Pick Rate
+4.8%
New-to-Wukong Buyers
29%
Context

Vietnam showed a multi-month revenue decline while DAU and session time stayed stable. The revenue team's initial read was weak player sentiment or market softness. I pushed back: a DAU–revenue divergence of this magnitude rules out an engagement problem. The cause had to be structural, inside the monetization system itself.

Revenue decomposition by payer tier confirmed it: the decline was concentrated in the high-value segment, not broad-based. That narrowed the diagnosis considerably.

The Pivot

I introduced an item ownership-rate metric β€” measuring what share of featured items were already owned by active high-value payers. The data showed that 80%+ of re-featured content was already in their inventories. These users had the budget to spend. They had no new items to spend it on. The problem was a content supply gap, not a demand problem β€” and the fix had to come from the content roadmap, not from discounts or promotions.

I presented this diagnosis to the revenue and content planning teams, reframing the conversation from "how do we re-engage payers" to "how do we ensure high-value payers always have something new to buy."

What Changed

The content team adopted a revised pacing policy: enforced cooldown periods on re-featured items and a guaranteed monthly new SKU for the high-value segment. After the change, new-content revenue recovered to ~50% of total Vietnam revenue.

The ownership-rate metric is now a standing leading indicator in market monitoring β€” it flags content saturation risk before revenue drops, giving the content team enough lead time to adjust the roadmap proactively.

Activity Trend
Stable
Root Cause
Content Saturation
New Content Revenue
β†’ ~50%
Skills

Technical Toolkit

Methods, tools, and cross-functional capabilities used across the projects above

Analysis Methods

Causal InferencePSM A/B TestingSame-User Comparison Cohort AnalysisFunnel Decomposition

Data Stack

SQLHive SparkPython TableauExcel

Product Domains

Trust & SafetyPlayer Experience Network PerformanceCombat Balance MonetizationRevenue Strategy

Cross-functional

Product StrategyEngineering Partnership Operations Enablement15 Markets Monitoring & SOPs