大数据学习之路
Hadoop
- http://hadoop.apache.org/
- Apache Hadoop API REFERENCE: https://hadoop.apache.org/docs/current/api/index.html
- Hadoop: Setting up a Single Node Cluster: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
- Hadoop Cluster Setup: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
- HDFS Users Guide: http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
- Hadoop Commands Guide: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html
- DistCp: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
- Hadoop Archives Guide: https://hadoop.apache.org/docs/current/hadoop-archives/HadoopArchives.html
- Hadoop: Capacity Scheduler: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
- HDFS Architecture: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
- HDFS Permissions Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html
- HDFS Quotas Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html
- File System Shell: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/FileSystemShell.html
- HDFS Users Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
- MapReduce Tutorial: http://apache.github.io/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
- Native Libraries Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/NativeLibraries.html
-
Hadoop Streaming: https://hadoop.apache.org/docs/stable/hadoop-streaming/HadoopStreaming.html
- Hadoop 1.0.0集群安装: http://blog.sina.com.cn/s/blog_540c640b01010fo3.html
- 一步步教你Hadoop多节点集群安装配置: http://www.cnblogs.com/lanxuezaipiao/p/3525554.html
-
Hadoop JobHistory: http://www.cnblogs.com/luogankun/p/4019303.html
- 国内第一篇详细讲解hadoop2的automatic HA+Federation+Yarn配置的教程:http://www.cnblogs.com/meiyuanbao/p/3545929.html
- 如何编译Apache Hadoop2.2.0源代码:https://www.cnblogs.com/jiangxinnju/p/6286849.html
- ZooKeeper的分布模式安装:http://www.superwu.cn/2013/08/10/413/
- Hadoop HDFS和KFS (CloudStore)的比较: http://blog.csdn.net/Cloudeep/article/details/4467238
- performance-benchmark-cgl-mapreduce-mpi-and-hadoop: http://salsahpc.indiana.edu/content/performance-benchmark-cgl-mapreduce-mpi-and-hadoop
- 汇总运行在Hadoop YARN上的开源系统: http://dongxicheng.org/mapreduce-nextgen/run-systems-on-hadoop-yarn/
- 利用Hadoop实现超大矩阵相乘之我见(一): http://www.cnblogs.com/eczhou/p/3340731.html
-
使用mapreduce计算大矩阵相乘: http://f.dataguru.cn/forum.php?mod=viewthread&tid=133912&extra=page%3D1&ordertype=1
- Hadoop Journal Node 作用:https://my.oschina.net/u/189445/blog/661561
- ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM: http://blog.csdn.net/caiandyong/article/details/50913268
- Hadoop在master查看live nodes为0解决方案: http://blog.csdn.net/shenlan211314/article/details/7414728
-
VMware虚拟机中Hadoop服务的端口无法访问的问题: http://www.cnblogs.com/sdjnzqr/p/3865477.html
- 虾皮工作室(讲解大数据相关知识,如《细细品味Hadoop系列文章》):http://www.cnblogs.com/xia520pi/
- [Hadoop系列]Hadoop的MapReduce中多文件输出: http://blog.csdn.net/inkfish/article/details/5156651
- Hadoop源码分析——TaskAttemptContext类和TaskAttemptID类: http://blog.sina.com.cn/s/blog_61ef49250100vcps.html
- hadoop代理用户 -超级用户代理其它用户: http://www.aboutyun.com/thread-16507-1-1.html
Ambari
- Ambari: http://ambari.apache.org/
- Ambari——大数据平台的搭建利器: https://www.ibm.com/developerworks/cn/opensource/os-cn-bigdata-ambari/
Spark
- http://spark.apache.org/
- http://spark.apache.org/docs/latest/api/java/index.html
- mmicky 的博客: http://blog.csdn.net/book_mmicky/article/category/2604051 http://blog.csdn.net/book_mmicky/article/category/2261687
- Spark性能优化指南——基础篇: http://tech.meituan.com/spark-tuning-basic.html
- Spark性能优化指南——高级篇: http://tech.meituan.com/spark-tuning-pro.html
- RDD中cache和persist的区别: http://www.cnblogs.com/luogankun/p/3801062.html
- 每次进步一点点——Spark 中的宽依赖和窄依赖: http://blog.csdn.net/houmou/article/details/52531205
- Spark中的错误处理: http://blog.csdn.net/zrc199021/article/details/52711593
- Why does Spark RDD partition has 2GB limit for HDFS? https://stackoverflow.com/questions/29689719/why-does-spark-rdd-partition-has-2gb-limit-for-hdfs
- Spark 架构: http://www.cnblogs.com/gaoxing/p/5041806.html
- Spark(一): 基本架构及原理: http://www.cnblogs.com/tgzhu/p/5818374.html
Storm
Mahout
Apache Mahout是基于Hadoop生态圈的一个机器学习库。
- Mahout: https://mahout.apache.org/
neo4j
知识图谱由于其数据包含实体、属性、关系等,常见的关系型数据库诸如MySQL之类不能很好的体现数据的这些特点,因此知识图谱数据的存储一般是采用图数据库(Graph Databases)。而Neo4j是其中最为常见的图数据库。
- neo4j: https://neo4j.com/
Apache Phoenix
- Apache Phoenix: https://phoenix.apache.org/index.html
MPI
- Open MPI(Open Source High Performance Computing): https://www.open-mpi.org/
- Open MPI Frequently Asked Questions: https://www.open-mpi.org/faq/
- The “vader” shared memory transport in Open MPI: Now featuring 3 flavors of zero copy! https://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy
-
openMPI小集群安装: https://www.cnblogs.com/awy-blog/p/3402949.html
- MPICH: https://www.mpich.org/
- Setting Up an MPICH2 Cluster in Ubuntu: https://help.ubuntu.com/community/MpichCluster
- MPI Forum: https://www.mpi-forum.org/
-
DataMPI: http://datampi.org/index.html
- MapReduce-MPI Library: https://sjplimp.github.io/mapreduce/
- Hadoop 与MPI: https://blog.csdn.net/zhuliting/article/details/6824829
- mrmpi package in Ubuntu: https://launchpad.net/ubuntu/+source/mrmpi
Comments