Fancy DSA Fancy DSA
数据结构
算法
LeetCode
  • 关于
  • 导航 (opens new window)
  • 分类
  • 标签
  • 归档
设计模式 (opens new window)
博客 (opens new window)
GitHub (opens new window)

Jonsam NG

想的更多,也要想的更远
数据结构
算法
LeetCode
  • 关于
  • 导航 (opens new window)
  • 分类
  • 标签
  • 归档
设计模式 (opens new window)
博客 (opens new window)
GitHub (opens new window)
  • 开始上手
  • Plan 计划
  • Roadmap 路线
  • 算法简介
  • Sort 排序

  • Search 搜索

  • Recursive 递归

  • Graph 图

  • Tree 树

  • Math 数学

  • Hash 哈希

  • String 字符串

  • BitManipulation 位操纵

  • Backtracking 回溯

  • DynamicProgramming 动态规划

    • ClimbingStairs [爬楼梯]
    • CoinChange [钱币兑换]
    • EditDistance [编辑距离]
    • FibonacciNumber [斐波那契数]
    • FindMonthCalendar [月历]
    • KadaneAlgo [最大连续子数组和之Kadane算法]
    • LevenshteinDistance [莱文斯坦距离]
    • LongestCommonSubsequence [最长公共子序列]
    • LongestIncreasingSubsequence [最长递增子序列]
    • LongestPalindromicSubsequence [最长回文子序列]
    • LongestValidParentheses [最长合法括号]
    • MaxNonAdjacentSum [最大非连接子集和]
    • MaxProductOfThree [最大三数积]
    • MinimumCostPath [最小代价路径]
    • NumberOfSubsetEqualToGivenSum [等和子集]
    • RodCutting [棒材切割问题]
    • Shuf [随机样本]
      • 介绍
      • 实现
      • 参考
    • SieveOfEratosthenes [埃拉托斯特尼筛法]
    • SlidingWindow [滑窗]
    • SudokuSolver [数独]
    • TrappingRainWater [接住雨水]
    • TribonacciNumber [翠波那契数]
    • ZeroOneKnapsack [零一背包]
  • Cache 缓存

  • Array 数组

  • Ciphers 密码学

  • Conversions 转换

  • ProjectEuler 欧拉计划

  • 其他

  • 算法
  • DynamicProgramming 动态规划
jonsam
2022-09-26
目录

Shuf [随机样本]

# 介绍

给定一个未知大小的数据集,按随机顺序获得一个随机样本。这通常被用于数据分析,作为一种从数据池或仓库或 CSV 文件中获取小的随机样本的方法。

# 实现

# JavaScript

/*
Given a data set of an unknown size, Get a random sample in a random order
It's used in data analytics, often as a way to get a small random sample from a data lake or warehouse, or from a large CSV file
*/
function shuf (datasetSource, sampleSize) {
  const output = fillBaseSample(datasetSource, sampleSize)
  return randomizeOutputFromDataset(datasetSource, output)
}

/**
 * Fills the output if possible, with the minimum number of values
 * @param {Iterable.<T>} datasetSource The iterable source of data
 * @param {number} sampleSize The size of the sample to extract from the dataset
 * @returns {Array.<T>} The random sample, as an array
 * @template T
 */
function fillBaseSample (datasetSource, sampleSize) {
  let filledIndexes = []
  let output = new Array(sampleSize)

  // Spread data out filling the array
  while (true) {
    const iterator = datasetSource.next()
    if (iterator.done) break

    let insertTo = Math.floor(Math.random() * output.length)
    while (filledIndexes.includes(insertTo)) {
      insertTo++
      if (insertTo === output.length) {
        insertTo = 0
      }
    }
    output[insertTo] = {
      value: iterator.value
    }

    filledIndexes = [...filledIndexes, insertTo]

    if (filledIndexes.length === sampleSize) {
      break
    }
  }

  if (filledIndexes.length < output.length) {
    // Not a large enough dataset to fill the sample - trim empty values
    output = output.filter((_, i) => filledIndexes.includes(i))
  }

  return output.map((o) => o.value)
}

/**
 * Replaces values in the output randomly with new ones from the dataset
 * @param {Iterable.<T>} datasetSource The iterable source of data
 * @param {Array.<T>} output The output so far, filled with data
 * @returns {Array.<T>} The random sample, as an array
 * @template T
 */
function randomizeOutputFromDataset (datasetSource, output) {
  const newOutput = [...output]
  let readSoFar = output.length

  while (true) {
    const iterator = datasetSource.next()
    if (iterator.done) break
    readSoFar++

    const insertTo = Math.floor(Math.random() * readSoFar)
    if (insertTo < newOutput.length) {
      newOutput[insertTo] = iterator.value
    }
  }

  return newOutput
}

// Example

/**
 * Generates a random range of data, with values between 0 and 2^31 - 1
 * @param {number} length The number of data items to generate
 * @returns {Iterable<number>} Random iterable data
*/
function * generateRandomData (length) {
  const maxValue = Math.pow(2, 31) - 1
  for (let i = 0; i < length; i++) {
    yield Math.floor(Math.random() * maxValue)
  }
}

// const source = generateRandomData(1000)
// const result = shuf(source, 10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

# 参考

  • Sampling from a population of unknown size | by Andrea C. | Towards Data Science (opens new window)
编辑 (opens new window)
上次更新: 2022/10/27, 20:28:55
RodCutting [棒材切割问题]
SieveOfEratosthenes [埃拉托斯特尼筛法]

← RodCutting [棒材切割问题] SieveOfEratosthenes [埃拉托斯特尼筛法]→

最近更新
01
0-20题解
10-31
02
本章导读
10-31
03
算法与转换:Part1
10-28
更多文章>
Theme by Vdoing | Copyright © 2022-2022 Fancy DSA | Made by Jonsam by ❤
  • 跟随系统
  • 浅色模式
  • 深色模式
  • 阅读模式