文章详情页

java - 算法问题，2个数组，数组a保存1000W条手机号，数组b保存5000W条，找出两个数组相同的手机号，执行时间需要 <2s

浏览：174日期：2024-01-17 18:49:46

问题描述

有人说用归并算法，但是执行下来时间远远不止2s。在下实在想不出还有什么好办法，希望各位能给个提示或者解法，谢谢。

下面是我的测试代码：

public class TestA { public static void main(String[] args) {long[] a = new long[50000000];long num = 13000000000l;for (int i = 0; i < a.length; i++) { a[i] = (num + i);}long[] b = new long[10000000];long num2 = 14000000000l;for (int i = 0; i < b.length - 2; i++) { b[i] = (num2 + i);}b[9999999] = 13000000000l;b[9999998] = 13000000001l;long[] c = new long[a.length+b.length];long start = System.currentTimeMillis();for (int i = 0; i < a.length; i++) { c[i] = a[i];}for (int i = 0; i < b.length; i++) { c[i + a.length] = b[i];}System.out.println('start');sort(c, 0, c.length-1);long end = System.nanoTime();System.out.println(System.currentTimeMillis() - start);for (int i = 0; i < c.length; i++) {System.out.println(c[i]);} } public static void sort(long[] data, int left, int right) {if (left < right) { // 找出中间索引 int center = (left + right) / 2; // 对左边数组进行递归 sort(data, left, center); // 对右边数组进行递归 sort(data, center + 1, right); // 合并 merge(data, left, center, right);} } public static void merge(long[] data, int left, int center, int right) {long[] tmpArr = new long[data.length];int mid = center + 1;// third记录中间数组的索引int third = left;int tmp = left;while (left <= center && mid <= right) { // 从两个数组中取出最小的放入中间数组 if (data[left] <= data[mid]) {if(data[left] == data[mid]){ System.out.println(data[left]);}tmpArr[third++] = data[left++]; } else {tmpArr[third++] = data[mid++]; }}// 剩余部分依次放入中间数组while (mid <= right) { tmpArr[third++] = data[mid++];}while (left <= center) { tmpArr[third++] = data[left++];}// 将中间数组中的内容复制回原数组while (tmp <= right) { data[tmp] = tmpArr[tmp++];} }}

问题解答

回答1：

提供一个思路，我们要找的是两个数组里相同的电话号码。那么我们把第一个数组的电话号码建立一颗 字典树。在第二个的时候直接在字典树里查找即可。总体是一个 O(N * 11) = O(N) 的复杂度。每个电话号码 11 位的话。总的只是一个 O(N) 的复杂度。参考知乎：https://zhuanlan.zhihu.com/p/...

public class TrieTree { private class Node {char ch;TreeMap<Character, Node> node;int count;public Node(char ch) { ch = this.ch; node = new TreeMap<>(); count = 0;} } public class StringCount implements Comparable{public String str;public int count;public StringCount(String str, int count) { this.str = str; this.count = count;}@Overridepublic int compareTo(Object o) { StringCount t = (StringCount) o; if(this.count > t.count){return -1; } if(this.count < t.count){return 1; } return 0;} } private int indexStringCount; private StringCount[] stringCounts; private Node root; private int size; private static boolean DEBUG = true;public TrieTree() {root = new Node(’$’);size = 0; } public int insert(String str) {int res = 0;Node temp = root;for (int i = 0; i < str.length(); i++) { char ch = str.charAt(i); if (temp.node.get(ch) == null) temp.node.put(ch, new Node(ch)); temp = temp.node.get(ch);}if(temp.count == 0) size ++;res = ++temp.count;return res; }public StringCount[] getStringCount(){indexStringCount = 0;stringCounts = new StringCount[this.size];ergodic(root, '');Arrays.sort(stringCounts);{ for(StringCount s : stringCounts){System.out.println(s.str + ' ' + s.count); }}return stringCounts; } private void ergodic(Node foo, String str){if(foo.count != 0){ stringCounts[indexStringCount ++] = new StringCount(str, foo.count);}for(Character ch:foo.node.keySet()){ ergodic(foo.node.get(ch), str + ch);} }private void show(Node foo, String str) {if (foo.count != 0) { System.out.println(str + ' : ' + foo.count);}for(Character ch:foo.node.keySet()){ show(foo.node.get(ch), str + ch);} } public void showALL() {show(root, ''); } public int size(){return size; }public static void main(String[] args) {TrieTree tree = new TrieTree();String[] strs = { 'a', 'aa', 'a', 'b', 'aab', 'bba' };for (int i = 0; i < strs.length; i++) { tree.insert(strs[i]);}tree.showALL();System.out.println(tree.size);tree.getStringCount(); }}回答2：

刚刚找到一种方法，执行时间大概在2s左右：

public class TestB { static long count; public static void main(String[] args) {long[] a = new long[50000000];long num = 13000000000l;for (int i = 0; i < a.length; i++) { a[i] = num + i;}long[] b = new long[10000000];long num2 = 14000000000l;for (int i = 0; i < b.length - 3; i++) { b[i] = num2 + i;}b[9999999] = 13000000000l;b[9999998] = 13000000002l;b[9999997] = 13000000002l;long start = System.currentTimeMillis(); Arrays.sort(a);int flag = -1;for (int i = 0; i < b.length; i++) { count = b[i]; flag = Arrays.binarySearch(a, count); if (flag <= 50000000 && flag >= 0) {System.out.println(count + ' ' +flag); }}System.out.println(System.currentTimeMillis() - start); }}

这个方法主要思想是先排序，再使用 Arrays.binarySearch()方法进行二分法查询，返回匹配的数组下标。

修改了一下获取数据源的方法，发现如果使用随机数据源，耗费的时间是8s左右，误差时间主要消耗在sort()排序方法上，数据源的规律还是影响排序算法的时间复杂度的。代码修改如下：

public class TestB {

static long count;public static void main(String[] args) { System.out.println(random()); long[] a = new long[50000000]; for (int i = 0; i < a.length; i++) {a[i] = random(); } long[] b = new long[10000000]; for (int i = 0; i < b.length; i++) {b[i] = random(); } long start = System.currentTimeMillis(); Arrays.sort(b); Arrays.sort(a); int flag = -1; int cc =0; for (int i = 0; i < b.length; i++) {count = b[i];flag = Arrays.binarySearch(a, count);if (flag <= 50000000 && flag >= 0) { System.out.println(count + ' ' + flag); cc++;} } System.out.println('相同数据的数量：'+cc); System.out.println(System.currentTimeMillis() - start);}public static long random() { return Math.round((new Random()).nextDouble()*10000000000L+10000000000L);}

}

回答3：

考虑bitmap, 参考https://github.com/RoaringBit...RoaringBitmap aBM = new RoaringBitmap()for (int i = 0; i < a.length; i++) {

aBM.add(a[i])

}...RoaringBitmap interactionBM = RoaringBitmap.and(aBM, bBM)for (int item: interactionBM) {

System.out.println(item)

}

回答4：

long start = System.currentTimeMillis();HashSet<Long> alongs = new HashSet<>();for (long l : b) { alongs.add(l);}ArrayList<Long> sames = new ArrayList<>();for (long l : a) { if (alongs.contains(l)) {sames.add(l); }}long end = System.currentTimeMillis();System.out.println(end - start);

使用上述代码，在我的机器上，是8s

回答5：

http://tieba.baidu.com/p/3866...

回答6：

C#, 本地运行，release，611ms

long[] a = new long[50000000];long num = 13000000000L;for (int i = 0; i < a.Length; i++){ a[i] = (num + i);}long[] b = new long[10000000];long num2 = 14000000000L;for (int i = 0; i < b.Length - 2; i++){ b[i] = (num2 + i);}b[9999999] = 13000000000L;b[9999998] = 13000000001L;var hashSetB = new HashSet<long>(b);var matches = new List<long>();var timer = new System.Diagnostics.Stopwatch();Console.WriteLine('Starts...');timer.Start();for (var i = 0; i < a.Length; i++){ if (hashSetB.Contains(a[i])) {matches.Add(a[i]); }}timer.Stop();Console.WriteLine(timer.ElapsedMilliseconds);Console.WriteLine('Found match: ' + string.Join(', ', matches));Console.ReadLine();回答7：

redis SINTER(返回一个集合的全部成员，该集合是所有给定集合的交集。)

回答8：

如果说这个操作只能在数组中进行的话，没什么取巧的办法，至少要遍历较小的那个数组，甚至排序都是免不了的。而如果可以将数组内容导出到其他数据结构的话，又貌似有违题目初衷的嫌疑。出题者是不是想考验数组操作呢？

回答9：

来一种更简单的方法，在MBP上只要200ms左右。普通的Pentium G2020也只要250ms额，这种算法完全不行，回答10：

这题目其实算法是关键。建议大家看一下编程珠玑的第一章。会提供很好的思路。首先问题必须细化一下，就是手机号必须只有中国的手机号吗。否则数量会多很多。我简单说一下编程珠玑里是怎样存储电话号码的。他是只使用一个二进制的数字来存储所有的手机号码。一个二进制的数位可以很长很长。长度就等于最大的可能的那个电话号码。比如说13999999999，其实13可以省掉，我们的这个数字就是999999999位的一个二进制数。在每一位上，如果有这个电话号码，就标记为1，如果没有就标记为0.为了简单起见，我就假设手机号的范围是0-9，我们先准备一个10位的二进制数0000000000.假设第一个数组有3个电话号码，分别是1，5，7，那么存储这个数组的二进制数就是0010100010，这个数字的1，5，7位上是1，其他位是0。假设第二个数组有6个电话号码，分别是1，2，3，4，7，8那么存储这个数组的二进制数就是0110011110，这个数字的1，2，3，4，7，8位上是1，其他位是0。那么如何找出这两个数组中相同的电话呢？只需要做一下位运算中“按位与”（AND）的操作即可，同一位上两个都是1的，还是1，只要有一个是0的，就是0。0010100010 & 0110011110 = 0010000010

一下就找出来是第1位和第7位上是1的一个二进制数。

java

上一条：无法启动tomcat并报错：java.lang.Error:...跪求大神指点下一条：java - eclipse 生成WebService Client，部分正常，部分报错，什么原因？

排行榜

					
					android自带时钟应用的这个效果是怎么做的
大兄弟们，你们都用什么框架开发 web app
node.js - node --harmony不起效果 --harmony_destructuring却可以?
java - 我设置了cookie的max age，但是cookie依然在关闭游览器后消失了
angular.js - angular双向绑定机制异常
iOS webview播放全屏视频禁止缩小播放
多维数组遍历，求大佬解答？？？
python方法调用
Eclipse / Java-导入java。（namespace）。*是否有害？
Android 按钮clickable问题。
python3.x - python连oanda的模拟交易api获取json问题第五问
				

热门标签