Back to BigData
Getting data into the BigData system depends on the source from where you want to collect the data. There are various options available as of today.
First we will see the various sources from where data can be collected.
- Data collected by different sensors
- Log files generated by web applications
- Structured / Unstructured data in different external databases
- Data generated by mobile devices
- Data generated by social media (Facebook, Twitter, Instagram, etc)
- So forth and so on. The list is huge but only few are mentioned here.
As we see the list of sources is huge and there is no one solution fit for all. Based on the source of data we can use different tools to get that data into BigData servers. Here is list of some of those options:
- Flume: Can be used to collect log file data from different servers
- SQL Query: Can be used to collect the data from external databases
- Sqoop: Can be used to collect the data from external databases
- Files: Data can also be collected with basic OS file copy-paste operations. This is time consuming process and there are dis-advantages also but this can be used on very small scale operations
- REST APIs: Can be used to collect data from mobile devices, social media, etc
- Streaming: This can be achieved using Apache Kafka. This is another option which can be used to collect real time data from social media, server logs, credit card transactions, etc