| sample {SparkR} | R Documentation |
Return a sampled subset of this SparkDataFrame using a random seed. Note: this is not guaranteed to provide exactly the fraction specified of the total count of of the given SparkDataFrame.
sample(x, withReplacement, fraction, seed) sample_frac(x, withReplacement, fraction, seed) ## S4 method for signature 'SparkDataFrame,logical,numeric' sample(x, withReplacement, fraction, seed) ## S4 method for signature 'SparkDataFrame,logical,numeric' sample_frac(x, withReplacement, fraction, seed)
x |
A SparkDataFrame |
withReplacement |
Sampling with replacement or not |
fraction |
The (rough) sample target fraction |
seed |
Randomness seed value |
sample since 1.4.0
sample_frac since 1.4.0
Other SparkDataFrame functions: SparkDataFrame-class,
agg, arrange,
as.data.frame,
attach,SparkDataFrame-method,
cache, checkpoint,
coalesce, collect,
colnames, coltypes,
createOrReplaceTempView,
crossJoin, dapplyCollect,
dapply, describe,
dim, distinct,
dropDuplicates, dropna,
drop, dtypes,
except, explain,
filter, first,
gapplyCollect, gapply,
getNumPartitions, group_by,
head, hint,
histogram, insertInto,
intersect, isLocal,
isStreaming, join,
limit, merge,
mutate, ncol,
nrow, persist,
printSchema, randomSplit,
rbind, registerTempTable,
rename, repartition,
saveAsTable, schema,
selectExpr, select,
showDF, show,
storageLevel, str,
subset, take,
toJSON, union,
unpersist, withColumn,
with, write.df,
write.jdbc, write.json,
write.orc, write.parquet,
write.stream, write.text
## Not run:
##D sparkR.session()
##D path <- "path/to/file.json"
##D df <- read.json(path)
##D collect(sample(df, FALSE, 0.5))
##D collect(sample(df, TRUE, 0.5))
## End(Not run)