CV
Basics
Name | Tianyi Yang |
m0g1c14n [at] gmail [dot] com | |
Url | https://m0gician.github.io/ |
Summary | I |
Publications
-
2024 GMorph: Accelerating Multiple DNN Inference via Cross-Task Computation Reuse
Proceedings of the 19th European Conference on Computer Systems
-
2024 Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
Findings of the Association for Computational Linguistics: ACL 2024
-
2022 Event-Event Relation Extraction using Probabilistic Box Embedding
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Projects
- 2024.03 - Present
Unsupervised Mass Product Property Extraction
Develop a unsupervised pipeline that generates formatted properties for billions of listed goods on Pinduoduo & Temu.
- Developed a pipeline that converts property templates into finite state machines to guide LLM's decoding, ensuring legal JSON output for all LLM generated product properties
- Implemented a unsupervised data cleaning process to filter out duplicates and low quality product properties.
- Built a keyword and feature generation process to support billions of products.
- Designed a holistic evaluation pipeline for product properties, including format validation, keyword matching, and semantic similarity.
- 2023.09 - 2024.02
Red-teaming Benchmark for LLM-Integrated Apps
Analyze prompt injection attacks, develop compound attack strategies, and propose defense templates to enhance LLM safety through in-context learning
- Collect 200+ system prompts and instructions from user created ChatGPT based applications on OpenAI's GPT Store.
- Summarized state-of-the-art prompt injection paradigms into 10+ categories and further constructed compound attacks.
- Derived 20+ prompt templates for in-context defense against prompt injections.
- Built Raccoon benchmark to evaluate safety of LLM-integrated applications across models under complex attack scenarios.
- 2022.02 - 2022.10
DNN Acceleration via Graph Mutation
Improve DNN inference efficiency for multi-task systems.
- Led the design and implementation of a parser that converts PyTorch models to intermediate graph representations.
- Designed a graph compiler that reuses layers and estimates inference time of new models using sampling.
- Designed an algorithm based on simulated annealing to balance exploration and exploitation of graph merging.
- Utilized frameworks like TensorRT & TVM to further optimize model inference time.
- 2021.06 - 2022.01
Learning Joint Event Relations with Boxes
Ensure logical consistency across narrative events
- Designed an XML parser and a data loader for extracting labeled relations and improving the overhead of loading data.
- Aggregated data with pandas and visualized principal components with existing labels.
- Designed and implemented experiments that utilizes Longformer to handle larger text input which exceeds RoBERTa's max token size.
- 2021.06 - 2022.01
RL Algorithms with High Safety Guarantees
Safe ML algorithms with mathematical guarantees.
- Designed and developed machine learning algorithms with high safety constraints using Seldonian Framework.
- Derived concentration inequalities and implemented various Importance Sampling estimators for high confidence policy improvements.
- Developed a library for Seldonian Framework with Numpy and Numba optimizations.
- Opensourced on GitHub with full documentation.
Work
-
2024.02 - Present NLP Engineer
Pinduoduo
Design production-level LLM application.
- Implement unsupervised extraction pipeline for mass product properties.
- Construct keyword and feature generation process for billions of products.
- Build end-to-end system for query-product and product-product relevance learning.
- Optimize model inference efficiency via model distillation and quantization.
-
2022.07 - 2023.3 Software Engineer
Amazon Robotics
Develop next-gen Amazon Grocery Automation System.
- Designed and implemented AWS cloud infrastructure and algorithms to enable ultra fast 1-2 hour grocery delivery to Amazon customers as part of a new automation pilot program.
- Led the design of inventory and order language models.
- Designed an event driven MQTT workflow management system utilizing AWS IoT.
- Led the test infrastructure design to enable end to end testing of order ingestion to automation fulfillment.
-
2018.07 - 2018.09 Research Intern
Alibaba DAMO Academy
Research RL-based recommendation systems for Alibaba's online second-hand market
Xianyu .- Built the homepage Merchant Feed in
Xianyu to a classic Contextual Bandit problem. - Implemented a recommendation system based on a modified Linear UCB algorithm which utilizes both browsing features and click features.
- Built the homepage Merchant Feed in
Education
-
2020.09 - 2022.05 MA, United States
-
2016.09 - 2020.06 CA, United States
University of California, Irvine
Bachelor of Science
Computer Science & Engineering, minor in Statistics
Awards
- 2017.09
First Prize
Jamming With Ubuntu 2017 in Rugao
- 2017.06
Best TIPPERS program
UCI IoT Tippers Hackathon
- 2017.04
Best Project and Development Practices
CSULB BeachHacks
Languages
Chinese Mandarin | |
Native speaker |
English | |
Fluent |
Spanish | |
Beginner |