Questions and answers on big data on Stack Overflow


Stack Overflow is a question and answer site. It's 100% free, no registration required.


Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.


Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain (from Wikipedia, the free encyclopedia).


when you use delimiter arguments with file format other than text file

Sun, 18 November 2018 07:07 +0000 GMT

Spark max amount of event time windows

Sat, 17 November 2018 10:14 +0000 GMT

how to practice large SQL Server index or partition issues for interview?

Fri, 16 November 2018 21:09 +0000 GMT

Find same values in two huge datasets

Fri, 16 November 2018 16:28 +0000 GMT

Best Practices for Keras, Databases and Big Data [on hold]

Fri, 16 November 2018 12:56 +0000 GMT

Best NLP tool for Apache Spark

Fri, 16 November 2018 11:56 +0000 GMT

Difference between dates in PySpark SQL

Fri, 16 November 2018 02:27 +0000 GMT

Does enabling, CPU scheduling in YARN will really improve the parallel processing in spark?

Thu, 15 November 2018 22:20 +0000 GMT

How to achieve HA for spark streaming job which is adopting Direct stream approach

Thu, 15 November 2018 13:31 +0000 GMT

dict of dict of list tensor/pandas-dataframe/numpy-array

Thu, 15 November 2018 10:18 +0000 GMT

Is there a good way to display map tiles dynamic and real time?

Thu, 15 November 2018 09:03 +0000 GMT

how to keep elasticsearch's indexes while updating from version 1.5 to 5.6.1?

Wed, 14 November 2018 21:34 +0000 GMT

How to efficiently Join Two Large Tables [on hold]

Wed, 14 November 2018 19:03 +0000 GMT

Big data - Sketching algorithm

Wed, 14 November 2018 16:45 +0000 GMT

Writing data in hive warehouse directory in two separate tables using flume

Wed, 14 November 2018 06:48 +0000 GMT

Oracle PGX on Yarn - 404 on WebService

Tue, 13 November 2018 16:39 +0000 GMT

Hive is closing whenever i fire any query

Tue, 13 November 2018 14:11 +0000 GMT

get the current date and set it to variable in order to use it as table name in HIVE

Tue, 13 November 2018 14:02 +0000 GMT

Read first 1000 lines from very big JSON Lines file (R)

Tue, 13 November 2018 13:51 +0000 GMT

Case study in Informatica Big Data Edition

Tue, 13 November 2018 13:29 +0000 GMT

Run multiple spark queries in parallel in a multi-user environment on a static dataset

Tue, 13 November 2018 07:26 +0000 GMT

Sqoop Syntax error, unexpected tIdentifier

Tue, 13 November 2018 06:51 +0000 GMT

An efficient way of aggregating data from repeated measurements [duplicate]

Sun, 11 November 2018 12:41 +0000 GMT

Plot the learning curve (X-axis=train_size (observations), Y-axis= error count) for a xgboost model using R [closed]

Sun, 11 November 2018 02:11 +0000 GMT

Big Text file(even a python list can't hold it at all) containing integers as string, can we sort them in Python [duplicate]

Sat, 10 November 2018 20:35 +0000 GMT

Suggestion about Minhash implementation with n permutation

Sat, 10 November 2018 15:43 +0000 GMT

Streaming command is getting failed in BIgData in Hadoop 1.2.1

Sat, 10 November 2018 15:43 +0000 GMT

How can I get the first registry with a type A and the first registry with a type B and move it to a single row

Sat, 10 November 2018 13:12 +0000 GMT

Hive query failing with the below exception [closed]

Fri, 09 November 2018 19:36 +0000 GMT

Data mining for Big data, Data mining with Big data, Datamining++ [closed]

Fri, 09 November 2018 19:06 +0000 GMT