Skip to content
View hkyyx's full-sized avatar
:octocat:
Working from home
:octocat:
Working from home

Organizations

@trinodb

Block or report hkyyx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hkyyx/README.md

Hank Yan

Cloud Infrastructure / Site Reliability Engineering / Platform Operations
Shanghai, China

I build production infrastructure and operational tooling for Kubernetes and AWS environments, with a focus on reliability, observability, and safe automation. Recently, I have been working on AI-supported incident analysis systems that turn alerts, runbooks, and platform evidence into actionable operational context.

Focus areas

  • Kubernetes / EKS production operations: troubleshooting, release safety, cluster access patterns, and platform reliability
  • AWS infrastructure: IAM, networking, Bedrock AgentCore, MCP-backed tooling, and environment-aware automation
  • AI for operations: runbook retrieval, alert summarization, evidence collection, and human-in-the-loop recommendations
  • Incident workflows: Alertmanager intake, triage, evidence collection, escalation context, and post-incident improvements
  • Infrastructure systems: Terraform, Jsonnet, CI/CD, documentation, and operator-facing runbooks

Current work

  • Designing AI-supported incident analysis systems that prepare production context before responders engage
  • Connecting Kubernetes, AWS, and runbook evidence through controlled tool access
  • Building automation that stays auditable, reversible, and useful under pressure
  • Turning repeated operational lessons into durable platform defaults

Engineering principles

  • Reliability first: optimize for observable, understandable systems over clever automation
  • Least privilege by default: production access should be scoped, reviewed, and easy to audit
  • Human-in-the-loop operations: automation should explain, recommend, and reduce toil before it remediates
  • Incidents should compound into better tooling, better docs, and safer release paths

Tools I use often

AWS · Kubernetes · Bedrock AgentCore · MCP · Python · TypeScript · Shell · Docker · Terraform · Jsonnet · PostgreSQL · Redis

LLM Token Activity

LLM Token Activity

Contact

Shanghai, China · yixingyan@gmail.com

Popular repositories Loading

  1. hkyyx.github.io hkyyx.github.io Public

    blog

    JavaScript 1

  2. git-recipes git-recipes Public

    Forked from zhongyi-tong/git-recipes

    :octocat: Git recipes in Chinese. 高质量的Git中文教程.

  3. ceph-ansible ceph-ansible Public

    Forked from ceph/ceph-ansible

    Ansible playbooks for Ceph

    Python 1

  4. ceph ceph Public

    Forked from ceph/ceph

    Ceph is a distributed object, block, and file storage platform

    C++

  5. mirantis-cephlcm mirantis-cephlcm Public

    Forked from Mirantis/ceph-lcm

    Python

  6. quick-SQL-cheatsheet quick-SQL-cheatsheet Public

    Forked from enochtangg/quick-SQL-cheatsheet

    A quick reminder of all SQL queries and examples on how to use them.