Rakuten Group, Inc. is the largest e-commerce company in Japan, and third largest e-commerce marketplace company worldwide, with over 1.5 billion registered users worldwide. The Rakuten brand is recognized worldwide for its leadership and innovation, and provides a variety of consumer and business-focused services including e-commerce, e-reading, travel, banking, securities, credit card, e-money, portal and media, online marketing and professional sports. The company is expanding globally and currently has operations throughout Asia, Western Europe, and the Americas.
Rakuten Viki is a premier global entertainment streaming site where millions of people discover and consume primetime shows and movies subtitled in more than 200 languages, by our community of fans. With billions of videos viewed and more than 1 billion words translated, Viki brings global entertainment to fans everywhere!
Based in Singapore, this Senior Engineer, Data role reports into Engineering Manager and will play a critical role in building the pioneer Data Engineering Team at Viki!
About the Data Engineering Team
Viki is establishing a Data Engineering team from the ground up, for the purpose of addressing the business’s growing data needs. This team is going to be responsible for designing and implementing a data architecture that is able to provide reliable data systems and clean data for various stakeholders across Viki including but not limited to
- Data Analysts who need to spend a lot of time finding insights from the data, build reports to track business performance against OKRs,
- Product Managers who need to understand our customers’ behaviors, their journey on our platform, understand customer funnels,
- Marketing teams to be able to build customer segments for marketing campaigns,
- Content Operations to track the performance of our shows across various markets and customer segments,
- CRM team to understand our customer and manage our relationships with them, and so on
Building this overall data architecture includes designing and building the ingestion systems for different data formats (files, databases, events), designing processing pipelines that can scale with data volume, data management strategies (Data Lake, Data Warehouse) that’s optimal for long term storage, queries for reporting / visualization, building APIs as well as ML models on top of and data sharing with third-party applications for both batch and streaming data. While doing so, set up proper data governance practices and policies for data retention, compliance, PII handling, GDPR/PDPA/CCPA handling, among other things.
In addition to this, in the longer term, the team is expected to build abstractions and data models that can enable future needs with building systems for content recommendations, search recommendations, building as well as operationalizing machine learning models for subtitle translations, recommendations, churn prediction and so on.
Key Responsibilities:
- Translating the pipelines into reusable and scalable data pipelines and frameworks for ingestion, processing, storage and consumption
- Perform root cause analysis on internal and external data systems to answer specific business questions, identifying and calling out data and systems issues, and improvements in a timely manner
- Improving and maintaining the existing application & workflows’ correctness, performance, SLAs and architecture’s integrity
- Upholding adherence to the right data engineering practices while building the data systems and pipelines, such as proper automation testing, CI / CD, logging, monitoring and alerting, while highlighting areas of improvements
- Contribute to POCs in evaluating SaaS or PaaS vendors that can solve specific problems in our architecture
- Identifying patterns in code and refactor them into modules that are easy to extend / reuse
- Performing code reviews of the team’s PRs and ensuring high standards of code quality, in addition to ensuring that development guidelines are followed
- Guiding junior members of the team on technically complex aspects of the system, or wherever necessary
Requirements:
- Bachelors or Masters in Computer Science or a related field, or a strong past work experience in building software systems or products
- 4-8 years of experience in developing production critical software, including 3-4 years working on data related systems.
- Strong knowledge of software concepts, design patterns, refactoring and automated testing
- Good judgment and diligence to know what patterns to use, when and where, and are able to confidently hold constructive conversations on it with the team
- Strong communication skills and are able to explain technical and non-technical concepts to the junior members of the team, as well as the peers and managers
- Good hands-on experience building APIs using: Java, Scala, Golang and /or Python, or willingness to pick one of them / Relational and / or NoSQL DBs (Postgresql or Mysql or MongoDB or equivalent) / Caching technologies like Redis or Memcache
- Very strong SQL knowledge and experience working on query optimization, data modeling
- Good experience working with / using one or more of the following: Data Warehousing technologies such as Redshift, BigQuery, Snowflake or other big data storages like CockroachDB, Cloud Spanner, BigTable, etc / Any Data Processing frameworks and technologies such as Spark, Apache Beam, Dataflow, EMR, AWS Glue / Messaging systems such as Kafka, PubSub and Stream processing / Open File Formats such as Parquet, ORC, etc / Building and operating data applications in cloud environments (AWS or GCP)
- 3rd-party solutions and technologies such as Fivetran, Snowplow, Segment, or the likes of it
- Added advantage, if you have knowledge of data infrastructure management and Infra-as-Code (IaC)
Rakuten provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type. Rakuten considers applicants for employment without regard to race, color, religion, age, sex, national origin, disability status, genetic information, protected veteran status, sexual orientation, gender, gender identity or expression, or any other characteristic protected by federal, state, provincial or local laws.