大数据学习之路

1 minute read

本篇文章汇总了大数据学习的相关资源，包括大数据的基本概念、安装配置、编程语言特性等内容，适合大数据开发者参考。

Hadoop

http://hadoop.apache.org/
Apache Hadoop API REFERENCE: https://hadoop.apache.org/docs/current/api/index.html
Hadoop: Setting up a Single Node Cluster: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Hadoop Cluster Setup: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
HDFS Users Guide: http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
Hadoop Commands Guide: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html
DistCp: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
Hadoop Archives Guide: https://hadoop.apache.org/docs/current/hadoop-archives/HadoopArchives.html
Hadoop: Capacity Scheduler: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
HDFS Architecture: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
HDFS Permissions Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html
HDFS Quotas Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html
File System Shell: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/FileSystemShell.html
HDFS Users Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
MapReduce Tutorial: http://apache.github.io/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
Native Libraries Guide: http://apache.github.io/hadoop/hadoop-project-dist/hadoop-common/NativeLibraries.html
Hadoop Streaming: https://hadoop.apache.org/docs/stable/hadoop-streaming/HadoopStreaming.html
Hadoop 1.0.0集群安装: http://blog.sina.com.cn/s/blog_540c640b01010fo3.html
一步步教你Hadoop多节点集群安装配置: http://www.cnblogs.com/lanxuezaipiao/p/3525554.html
Hadoop JobHistory: http://www.cnblogs.com/luogankun/p/4019303.html
国内第一篇详细讲解hadoop2的automatic HA+Federation+Yarn配置的教程：http://www.cnblogs.com/meiyuanbao/p/3545929.html
如何编译Apache Hadoop2.2.0源代码：https://www.cnblogs.com/jiangxinnju/p/6286849.html
ZooKeeper的分布模式安装：http://www.superwu.cn/2013/08/10/413/
Hadoop HDFS和KFS (CloudStore)的比较: http://blog.csdn.net/Cloudeep/article/details/4467238
performance-benchmark-cgl-mapreduce-mpi-and-hadoop: http://salsahpc.indiana.edu/content/performance-benchmark-cgl-mapreduce-mpi-and-hadoop
汇总运行在Hadoop YARN上的开源系统: http://dongxicheng.org/mapreduce-nextgen/run-systems-on-hadoop-yarn/
利用Hadoop实现超大矩阵相乘之我见（一）: http://www.cnblogs.com/eczhou/p/3340731.html
使用mapreduce计算大矩阵相乘: http://f.dataguru.cn/forum.php?mod=viewthread&tid=133912&extra=page%3D1&ordertype=1
Hadoop Journal Node 作用：https://my.oschina.net/u/189445/blog/661561
ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM: http://blog.csdn.net/caiandyong/article/details/50913268
Hadoop在master查看live nodes为0解决方案: http://blog.csdn.net/shenlan211314/article/details/7414728
VMware虚拟机中Hadoop服务的端口无法访问的问题: http://www.cnblogs.com/sdjnzqr/p/3865477.html
虾皮工作室(讲解大数据相关知识，如《细细品味Hadoop系列文章》)：http://www.cnblogs.com/xia520pi/
[Hadoop系列]Hadoop的MapReduce中多文件输出: http://blog.csdn.net/inkfish/article/details/5156651
Hadoop源码分析——TaskAttemptContext类和TaskAttemptID类: http://blog.sina.com.cn/s/blog_61ef49250100vcps.html
hadoop代理用户 -超级用户代理其它用户: http://www.aboutyun.com/thread-16507-1-1.html

Ambari

Ambari: http://ambari.apache.org/
Ambari——大数据平台的搭建利器: https://www.ibm.com/developerworks/cn/opensource/os-cn-bigdata-ambari/

Spark

http://spark.apache.org/
http://spark.apache.org/docs/latest/api/java/index.html
mmicky 的博客: http://blog.csdn.net/book_mmicky/article/category/2604051 http://blog.csdn.net/book_mmicky/article/category/2261687
Spark性能优化指南——基础篇: http://tech.meituan.com/spark-tuning-basic.html
Spark性能优化指南——高级篇: http://tech.meituan.com/spark-tuning-pro.html
RDD中cache和persist的区别: http://www.cnblogs.com/luogankun/p/3801062.html
每次进步一点点——Spark 中的宽依赖和窄依赖: http://blog.csdn.net/houmou/article/details/52531205
Spark中的错误处理: http://blog.csdn.net/zrc199021/article/details/52711593
Why does Spark RDD partition has 2GB limit for HDFS? https://stackoverflow.com/questions/29689719/why-does-spark-rdd-partition-has-2gb-limit-for-hdfs
Spark 架构: http://www.cnblogs.com/gaoxing/p/5041806.html
Spark(一): 基本架构及原理: http://www.cnblogs.com/tgzhu/p/5818374.html

Storm

Mahout

Apache Mahout是基于Hadoop生态圈的一个机器学习库。

Mahout: https://mahout.apache.org/

neo4j

知识图谱由于其数据包含实体、属性、关系等，常见的关系型数据库诸如MySQL之类不能很好的体现数据的这些特点，因此知识图谱数据的存储一般是采用图数据库（Graph Databases）。而Neo4j是其中最为常见的图数据库。

neo4j: https://neo4j.com/

Apache Phoenix

Apache Phoenix: https://phoenix.apache.org/index.html

MPI

Open MPI(Open Source High Performance Computing): https://www.open-mpi.org/
Open MPI Frequently Asked Questions: https://www.open-mpi.org/faq/
The “vader” shared memory transport in Open MPI: Now featuring 3 flavors of zero copy! https://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy
openMPI小集群安装: https://www.cnblogs.com/awy-blog/p/3402949.html
MPICH: https://www.mpich.org/
Setting Up an MPICH2 Cluster in Ubuntu: https://help.ubuntu.com/community/MpichCluster
MPI Forum: https://www.mpi-forum.org/
DataMPI: http://datampi.org/index.html
MapReduce-MPI Library: https://sjplimp.github.io/mapreduce/
Hadoop 与MPI: https://blog.csdn.net/zhuliting/article/details/6824829
mrmpi package in Ubuntu: https://launchpad.net/ubuntu/+source/mrmpi

Share on

X Facebook LinkedIn Bluesky

Aloys

大数据学习之路

Hadoop

Ambari

Spark

Storm

Mahout

neo4j

Apache Phoenix

MPI

Share on

Comments

You May Also Enjoy

虚拟化与容器化学习之路

Jetpack/Compose/KMP

Awesome ADB

测试学习之路