Mangesh
[email protected] 201 489 3500 Ext 303
Summary
• Over 16 years of diversified experience in Software Design, Development & Administration. Experience as Azure Data Engineer along with Hadoop ecosystems.
• Experience in the field of software with expertise in backend applications.
• Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
• Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server and Tableau.
• Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions, Developed and deployed data pipeline in Azure cloud.
• Excellent T-SQL Writing skills
• Experience in Hadoop development of enterprise level solutions utilizing Hadoop components such as Apache Spark, Map Reduce, HDFS, Sqoop, Kafka, Oozie, Yarn, PIG, Hive, Zookeeper and Flume.
• Worked on multiple data formats on HDFS using Scala / PySpark.
• Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka.
• Excellent understanding and knowledge of job workflow scheduling and locking tools /services like Oozie and Airflow
• Experience in developing ETL pipelines for in and out of data warehouse using python in Azure.
• Used windowing and analytical functions, Partitioning, bucketing in hive to get better performance.
• In-depth understanding of Hadoop and its various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN), Application Master, Node Manager.
• Experience in developing UDFs in Java as and when necessary to use in PIG and HIVE queries.
• Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS and other services of the AWS family
• Designed and architected scalable data processing and analytics solutions, including technical feasibility, integration, development for Big Data storage, processing and consumption of Azure data, analytics, big data (Hadoop, Spark), business intelligence (Tableau), NoSQL, HDInsight, Stream Analytics, Data Factory, Event Hubs
• Experienced in Query Optimization and Performance Tuning, Skilled in designing training plan and conducting training sessions on MS SQL Server.
• Excellent analytical and communication skills to work with various teams, technical and business leadership.
Technical Skills:
Programming Languages T-SQL, PLSQL, Python, Shell script basic, Scala
Data Engineering Python, HDFS, Data Warehousing, ETL, Azure Data Lake, Azure blob storage Gen2, Data factory, IOT, Azure Databricks
Cloud Services Azure, ADF, Azure blob storage, Hands on AWS
Big Data Frameworks Hadoop, Spark
Hadoop Eco System HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Oozie, Airflow, Yarn, Kafka, Zookeeper, Cloudera, Hortonworks
Spark Eco System Spark Core, Spark SQL, MLlib, and Spark Streaming
Database Oracle, MySQL, MSSQL, Spark SQL, Hands on – MongoDB & PostgreSQL
ETL SSIS, PySpark
Visualization tools Tableau, Hands on SSRS and PowerBI
Operating System Windows, Mac OS, Linux.
Version Control Git, Bitbucket, Jenkins, VS
Professional Experience:
Role: Sr. Data Engineer Apr 2020 – Present
Client: NBA | Secaucus, NJ
Responsibilities:
• Working on data processing and creating file scripts using Unix Shell scripting and Wrote python script to push data to HDFS directory.
• Designing and developing complex data pipelines using Azure, Sqoop, Spark and databases for data ingestion, data analysis and transformation.
• Performing data cleansing, data transformation and data manipulation using. Importing data from various databases.
• Lead Hadoop implementation integrating with multiple legacy applications for data lake
• Planning the replacement of old DW & OBI with Yellowfin/Tableau reporting solution for all the subject areas
• Designed and developed data migration plan to process all the third-party data to Azure and executed it successfully.
• Developed PySpark code for Azure jobs and integrated them with Azure data lake and other cloud services.
• Created Python script to load data from Hive tables to Oracle database and Power BI is connected to these tables to create visualizations. Applied different HDFS formats and structures to speed up analytics.
• Created Hive partitioning and bucket tables in HDFS and performing data validation between ETL and Hive tables.
• Created tables to hold metadata information of all flat files and developed pass-through mappings to extract data from various sources and load the information to staging tables.
• Developed the shell scripts for migrating and deploying in Production Servers.
• Designed and developed ETL data flow architecture for couple of source systems.
• Design and develop Pipeline, pulled data from heterogeneous data source and API
• Collaborated on ETL (Extract Transform Load) tasks, maintaining data integrity and verifying pipeline stability.
Role: Data Engineer Jan 2018 – Mar 2020
Client: Wolters Kluwer – New York
Responsibilities:
• Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution. Re-designing the Legacy Application solutions with minimal changes to run on cloud platform.
• Built the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS Packages, Talend Jobs, custom .Net and Python codes.
• Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.
• Build various pipeline to integrate the Azure Cloud to AWS S3 to get the data into Azure Database. Set up the Hadoop and Spark cluster for the various POCs, specifically to load the Cookie level data and real-time streaming. Integrate with other ecosystems like Hive, HBase, Spark, HDFS/Data Lake/Blob Storage.
• Set up the Spark Cluster to process the more than 2 Tb of data and dumped into SQL Server. In addition, built various Spark jobs to run Data Transformations and Actions.
• Writing a different APIs to connect with the different Media Data feeds like, Prisma, Double Click Management, Twitter, Facebook, Instagram and Amnet to get the Data using Azure Web Job and Functions integration with Cosmos DB.
• Built the trigger-based Mechanism to reduce the cost of different resources like Web Job and Data Factories using Azure Logic Apps and Functions.
• Extensively worked on Relational Database, MS SQL as well as Oracle.
• Migrated SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
• Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services – (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
Role: Database Engineer June 2015– Dec 2017
Client: Wolters Kluwer – New York
Responsibilities:
• Involved in understanding of Business processes, grain identification, identification of dimensions and measures for OLAP applications.
• Worked with Business users to provide the data in the form of complex reports and spread sheets for analysis.
• Involved in Design of Scalable, reusable, and low maintenance of ETL Templates
• Involved in Performance Tuning to increase the throughput
• Performed code migration, testing, debugging, and documentation and maintained the programs according to the Client Standards
Other Projects: March 2007 – May 2015
Role: Database Administrator/Developer | Digital Group
• Case Vault and Case Vantage (CV & CVG)
Client: Access Data – New York
• Archiving and Purging, NRAI
Client: CT Corporation
• DPE
Client: DPE Fiji
• Easy Roommate (ERM)
Client: DM Services – London UK
Education:
• Diploma in Automobile from Mumbai Technical Board
• Bachelor in Computer Application (BCA) from Savitribai Phule University (Pune University), India
• Master in Business Administration (MBA) from Savitribai Phule University (Pune University), India
Certification
Microsoft Azure
• DP-200 Implementing an Azure Data Solution
• DP-203 Certified Azure Data Engineer (DP-203)
Oracle 11g Certified Professional (OCP)
• 1Z0-055 Oracle Database 11g: New Features for 9i OCPs
Oracle 9i Certified Professional (OCP)
• 1Z0-007 SQL
• 1Z0-031 Architecture and Administration
• 1Z0-032 Backup and Recovery
• 1Z0-033 Performance and Tuning