- Strong desire to grow a career as a Data Scientist in highly automated industrial manufacturing doing analysis and machine learning on terabytes and petabytes of diverse datasets.
- Experience in the areas: statistical modeling, feature extraction and analysis, supervised/unsupervised/semi-supervised learning. Exposure to the semiconductor industry is a plus but not a requirement.
- Ability to extract data from different databases via SQL and other query languages and applying data cleansing, outlier identification, and missing data techniques.
- Strong software development skills.
- Strong verbal and written communication skills.
- Experience with or desire to learn:
- Machine learning and other advanced analytical methods
- Fluency in Python and/or R
- pySpark and/or SparkR and/or SparklyR
- Hadoop (Hive, Spark, HBase)
- Teradata and/or another SQL databases
- Tensorflow, and/or other statistical software including scripting capability for automating analyses
- SSIS, ETL
- Javascript, AngularJS 2.0, Tableau
- Experience working with time-series data, images, semi-supervised learning, and data with frequently changing distributions is a plus
- Experience working with Manufacturing Execution Systems (MES) is a plus
- Existing papers from CVPR, NIPS, ICML, KDD, and other key conferences are plus, but this is not a research position