Questions and answers on big data on Stack Overflow


Stack Overflow is a question and answer site. It's 100% free, no registration required.


Big Data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization.


Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies, software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain (from Wikipedia, the free encyclopedia).


How to serialise a large HashMap without getting a StackOverflowException

Tue, 22 January 2019 10:51 +0000 GMT

Is there a big data platform in R for running large set of isochrones? [on hold]

Mon, 21 January 2019 23:37 +0000 GMT

Allocate a mesos service using more than 1 node resources

Mon, 21 January 2019 23:02 +0000 GMT

Pig job fails with "org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120"

Mon, 21 January 2019 16:51 +0000 GMT

Pyspark conectivity with mysql

Mon, 21 January 2019 11:02 +0000 GMT

Compiling and Parsing Millions of files

Mon, 21 January 2019 09:52 +0000 GMT

Nifi receive full text tweet

Mon, 21 January 2019 09:50 +0000 GMT

the replacement of converted columns after downcasting doesn't end

Sun, 20 January 2019 20:43 +0000 GMT

Spark process on cluster is a java process

Sun, 20 January 2019 20:28 +0000 GMT

Advice on Machine Learning Project where training speed is of primary concern

Sun, 20 January 2019 14:45 +0000 GMT

How to fix empty output for the textfilestream code

Sat, 19 January 2019 18:11 +0000 GMT

How do I upsert into HDFS with spark?

Fri, 18 January 2019 21:03 +0000 GMT

Create five percent of sample database from a large database

Fri, 18 January 2019 18:58 +0000 GMT

How to unmerge/split files which has multiple avro(set of schema and records) files merged?

Fri, 18 January 2019 15:24 +0000 GMT

Memory efficient way to print summary of dataset

Fri, 18 January 2019 13:10 +0000 GMT

Ambari installation with the rest of hadoop ecosystems

Fri, 18 January 2019 05:22 +0000 GMT

How to aggregate data by its attributes(large data set)?

Thu, 17 January 2019 23:15 +0000 GMT

Big Data Load in Pandas Data Frame

Thu, 17 January 2019 09:33 +0000 GMT

Re-installing DCOS Masters without destroying the cluster

Thu, 17 January 2019 09:10 +0000 GMT

A Compiler Generated Query Plan Creation

Wed, 16 January 2019 20:11 +0000 GMT

Why does folding dataframes cause a NullPointerException? [duplicate]

Wed, 16 January 2019 10:12 +0000 GMT

numpy.memmap not able to handle very big data

Wed, 16 January 2019 04:08 +0000 GMT

How to write half billion entries to neo4j in a reasonable time (less than 1 day)?

Wed, 16 January 2019 03:39 +0000 GMT

How to label points in a large scatterplot (~280k points)

Wed, 16 January 2019 01:55 +0000 GMT

How to simplify pipeline with aggregation and accumulation?

Wed, 16 January 2019 01:06 +0000 GMT

Cassandra Range queries on Map values using timestamp

Tue, 15 January 2019 16:17 +0000 GMT

How to transpose datetime columns into date rows in Impala?

Tue, 15 January 2019 12:01 +0000 GMT

How can I version my data with Cassandra?

Tue, 15 January 2019 10:16 +0000 GMT

How is Big Data related to Data Science?

Sat, 12 January 2019 06:27 +0000 GMT

How to proceed sentiment analysis using word embedding like word2vec?

Thu, 10 January 2019 14:15 +0000 GMT