We don't need to convert semi/unstructured data to structured data. They have their own storing and analytics ways.
- Structured (retail, financial, bioinformatics, geodata)
- Semi-structured (web logs, email, documents) has data structure not conforming to existing data models like RDB, OODB.
- Unstructured (images, video, sensor data, web pages)
source: https://www.mongodb.com/hadoop-and-mongodb
A database shard is a horizontal partition of data in a database or search engine. Each individual partition is referred to as a shard or database shard. Each shard is held on a separate database server instance, to spread load.
Big data warehouse sw : http://tajo.apache.org