Rakuten Group, Inc. is the largest e-commerce company in Japan, and third largest e-commerce marketplace company worldwide, with over 1.5 billion registered users worldwide. The Rakuten brand is recognized worldwide for its leadership and innovation, and provides a variety of consumer and business-focused services including e-commerce, e-reading, travel, banking, securities, credit card, e-money, portal and media, online marketing and professional sports. The company is expanding globally and currently has operations throughout Asia, Western Europe, and the Americas.
Rakuten Viki is a premier global entertainment streaming site where millions of people discover and consume primetime shows and movies subtitled in more than 200 languages, by our community of fans. With billions of videos viewed and more than 1 billion words translated, Viki brings global entertainment to fans everywhere!
Based in Singapore, this Staff Engineer, Data role reports into Engineering Manager and will play a critical role in building the pioneer Data Engineering Team at Viki!
About the Data Engineering Team
Viki is establishing a Data Engineering team from the ground up, for the purpose of addressing the business’s growing data needs. This team is going to be responsible for designing and implementing a data architecture that is able to provide reliable data systems and clean data for various stakeholders across Viki including but not limited to
- Data Analysts who need to spend a lot of time finding insights from the data, build reports to track business performance against OKRs,
- Product Managers who need to understand our customers’ behaviors, their journey on our platform, understand customer funnels,
- Marketing teams to be able to build customer segments for marketing campaigns,
- Content Operations to track the performance of our shows across various markets and customer segments,
- CRM team to understand our customer and manage our relationships with them, and so on
Building this overall data architecture includes designing and building the ingestion systems for different data formats (files, databases, events), designing processing pipelines that can scale with data volume, data management strategies (Data Lake, Data Warehouse) that’s optimal for long term storage, queries for reporting / visualization, building APIs as well as ML models on top of and data sharing with third-party applications for both batch and streaming data. While doing so, set up proper data governance practices and policies for data retention, compliance, PII handling, GDPR/PDPA/CCPA handling, among other things.
In addition to this, in the longer term, the team is expected to build abstractions and data models that can enable future needs with building systems for content recommendations, search recommendations, building as well as operationalizing machine learning models for subtitle translations, recommendations, churn prediction and so on.
Key Responsibilities:
- Working with product and analytics teams and the engineering manager to understand the business, technical direction and make system/architecture decisions that support longer term needs and extensibility of the data architecture
- Evaluating SaaS and PaaS platforms that can be used to solve parts of the data architecture and integrate them into the architecture
- Providing subject matter expertise with designing the pipelines to be efficient and scalable to grow with the 3 Vs of Data (volume, velocity and variety), as well as with data modeling
- Building and operating the data platform service, including defining and tracking its SLA.
- Contributing to and conducting system design reviews for systems and pipelines that are being designed and implemented for various business use cases
- Working with the Engineering Manager to establish the right data engineering practices and ensure that they are followed well and that includes proper automation testing, CI / CD, logging, monitoring and alerting
- Identifying patterns in code and refactor them into modules that are easy to extend / reuse
- Creating and ensuring that guidelines to uphold and maintain the quality of the codebase and system is being adhered to by the team, and lead the way in doing so
- Making calls on when to take up tech debt, vs paying it off, and ensuring that we’re maintaining healthy levels of debt
- Creating reusable and extensible automation test suite that makes it easy for the team to add, and maintain a robust test suite for all services and pipelines
- Performing code reviews of the team’s PRs and ensuring high standards of code quality, in addition to ensuring that development guidelines are followed
- Guiding less experienced members of the team on technically complex aspects of the system, coaching them on systems thinking and architecture
- Making sure the overall integrity of the architecture is preserved and system documentations are kept up to date
Requirements:
- B.S. or M.S. in Computer Science or a related field
- 8-12 years of experience in developing professional grade software.
- 6-8 years of experience in data engineering
- Strong knowledge of software concepts, design patterns, refactoring and automated testing
- Great judgment and diligence to know what patterns to use, when and where, and are able to confidently hold constructive conversations on it with the team, and peers
- Strong communication skills and are able to explain technical and non-technical concepts to less experienced members of the team, as well as the peers and managers
- Strong hands-on experience building APIs using: Java, Scala, Golang and /or Python, or willingness to pick one of them / Relational and / or NoSQL DBs (Postgresql or Mysql or MongoDB or equivalent) / Caching technologies like Redis or Memcache
- Advanced SQL knowledge and experience working on query optimization, data modeling in Data Lake and Data Warehouse architectures
- Advanced knowledge and experience building and scaling both batch and streaming pipelines, and challenges that come with it
- Strong experience with working with / using one or more of the following: Data Warehousing technologies such as Redshift, BigQuery, Snowflake or other big data storages like CockroachDB, Cloud Spanner, BigTable, etc / Any Data Processing frameworks and technologies such as Spark, Apache Beam, Dataflow, EMR, AWS Glue / Messaging systems such as Kafka, PubSub and Stream processing / Open File Formats such as Parquet, ORC, etc / Building and operating data applications in cloud environments (AWS or GCP) / 3rd-party solutions and technologies such as Fivetran, Snowplow, Segment, or the likes of it
- Experience establishing Data Governance policies and practices
- Added advantage if you have experience working with Delta Lake, Streaming-only architectures, Data Marts and operationalising ML Models.
Rakuten is an equal opportunity employer. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, genetic information, marital status or any legally protected status. Women, minorities, individuals with disabilities and protected veterans are encouraged.