Jr Data scientist

Thomson Reuters
Knowledge & Skill
• Extremely strong skills in Machine learning, NLP or information retrieval skills.
• Proficiency in machine learning algorithms such as multi-class classifications, entity extraction, clustering
• Experience with NLP libraries like core NLP, mallet, deeplearning4j, etc
• Fluency in deep learning framework: TensorFlow / Keras
• Extremely strong technical skills in one or more coding frameworks, database work, distributed
computing, and product prototyping as like the following: JVM hosted language like Java, parallelization
strategies and scripting language such as Python.
• Solid relational database skills
• Solid understanding of RESTful web services
• Have experience with GIT, Bit bucket, AGILE SCRUM development process Roles & Responsibilities
• Solid understanding of statistics
• Able to handle large quantities of data without drowning in it
• The ability to tell a story about data, in particular with visualization.
• Strong sense of algorithm: can think rigorously, anticipate how to handle common problems such as
missing data, able to build re-usable modules of code that can be understood by peers.
• Strong written, communication and presentation skills. Able to respond and present work to peers,
answer in-depth questions, accept constructive feedback, and modify work product accordingly.
Roles
• Define, manipulate, aggregate and use both structured and unstructured “big data” in order to
support descriptive and predictive analytics across the businesses.
• Collaborate with scientists, product groups and content groups to perform “big data” aggregations,
symbology mapping, and manipulations of important datasets.
• Perform statistical (and machine learned) analyses on data to serve business purposes.
• Narrate stories (sometimes to a non-technical audience) about our content and processes by data
analysis and visualization.
• Define and develop software for the analysis and manipulation of large and very large datasets.
• Guide the architecture of “big-data” business processes with an eye towards robustness, parsimony
and reproducibility (at senior levels).
Responsibilities
• Becoming an expert at certain domains of knowledge and particular tools/techniques.
• Uses multiple approaches to get un-stuck when research progress slows down.
• Able to quickly spot and correct more subtle errors in data and analyses.
• Ensures that results are reproducible, correct and actionable.
• Can spot data quality issues with an ability to spot data, pipeline and model problems.
• Learning how to deal with typical challenges —finding data sources, lining them up, dealing with
any entitlements issues, keeping product managers on board, working closely with developers to
speed up the development and quality assurance process, writing coherent and clear whitepapers
and research notes.
• Developing relationships with peers in analytics across the company.
Must Have Skills
1) NLP, text classification and entity extraction
2) Clustering
3) Tensorflow
4) Machine learning
5) Deep learning, Neural networks
6) Java
7) Data structure
8) RESTful web services
9) Excellent problem-solving skills with a history
of superb delivery against assigned tasks
Good to Have Skills
1) Junit
2) Exposure of Agile and scrum software development
process
3) “Self-starter” attitude and ability to make decisions
independently
Education
MS or (ideally) PhD in Mathematics, Statistics, Computer
Science or other quantitative hard or a quantitative
social or hard science with a “data-science” orientation
to coursework and focus.
To apply for this job please visit jobs.thomsonreuters.com.