本篇內容介紹了“cdh3u3 hadoop 0.20.2 MultipleOutputs多輸出文件怎么實現(xiàn)”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!
創(chuàng)新互聯(lián)建站主要從事成都做網(wǎng)站、成都網(wǎng)站制作、網(wǎng)頁設計、企業(yè)做網(wǎng)站、公司建網(wǎng)站等業(yè)務。立足成都服務福田,十多年網(wǎng)站建設經(jīng)驗,價格優(yōu)惠、服務專業(yè),歡迎來電咨詢建站服務:028-86922220
1.新建一個multest.txt文件
11111,username,password,22,河北師范大學,軟件學院,2008 11112,username,password,22,河北師范大學,計算機學院,2008 11113,username,password,22,xx大學,軟件學院,2008 11114,username,password,22,xxx大學,計算機學院,2008 11115,username,password,23,2008
2.在hdfs上新建一個目錄,hadoop dfs -mkdir multest
3.將新建到文本文件上傳到multest目錄下:hadoop dfs -put /home/wjk/hadoop/multest.txt multest
4.新建Map/Reduce工程,將格式不符合(7位)到保存到dirtydata中,將河北師范大學軟件學院以外到數(shù)據(jù)保存到otherschool中,將河北師范大學軟件學院到數(shù)據(jù)保存到默認文件中。
public class Multest { public static class MultestMapper extends Mapper<Object, Text, Text, NullWritable> { private Text outkey = new Text(""); private MultipleOutputs<Text, NullWritable> mos; protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String details[] = line.split(","); if (details.length != 7) { outkey.set(line); mos.write("dirtydata", outkey, NullWritable.get()); } else { String school = details[4]; String college = details[5]; if (school.equals("河北師范大學") && college.equals("軟件學院")) { outkey.set(line); context.write(outkey, NullWritable.get()); } else { outkey.set(line); mos.write("otherschool", outkey, NullWritable.get()); } } } @Override protected void setup(Context context) throws IOException, InterruptedException { mos = new MultipleOutputs<Text, NullWritable>(context); super.setup(context); } @Override protected void cleanup(Context context) throws IOException, InterruptedException { mos.close(); super.cleanup(context); } } public static class MultestReducer extends Reducer<Text, NullWritable, Text, NullWritable> { protected void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args) .getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: <in> <out>"); System.exit(2); } Job job = new Job(conf, "multest"); job.setJarByClass(Multest.class); job.setMapperClass(MultestMapper.class); job.setReducerClass(MultestReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); MultipleOutputs.addNamedOutput(job, "dirtydata", TextOutputFormat.class, Text.class, NullWritable.class); MultipleOutputs.addNamedOutput(job, "otherschool", TextOutputFormat.class, Text.class, NullWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
5.編譯,導出jar,運行:hadoop jar ./../multest.jar com.wjk.test.Multest multest multestout
6.運行截圖
=======注意==========================
缺陷:集群上運行會有多個分散的文件
補充:按上述的寫法產(chǎn)生的文件很多,合并很難,可以執(zhí)行輸出目錄,合并的話按目錄getmerge就容易了。主要修改點在mos.write上,參考官方代碼,很簡單,自行領悟吧。
public <K, V> void write(String namedOutput, K key, V value) throws IOException, InterruptedException { write(namedOutput, key, value, namedOutput); } public <K, V> void write(String namedOutput, K key, V value,String baseOutputPath) throws IOException, InterruptedException { checkNamedOutputName(this.context, namedOutput, false); checkBaseOutputPath(baseOutputPath); if (!(this.namedOutputs.contains(namedOutput))) { throw new IllegalArgumentException("Undefined named output '" + namedOutput + "'"); } TaskAttemptContext taskContext = getContext(namedOutput); getRecordWriter(taskContext, baseOutputPath).write(key, value); }
“cdh3u3 hadoop 0.20.2 MultipleOutputs多輸出文件怎么實現(xiàn)”的內容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關的知識可以關注創(chuàng)新互聯(lián)網(wǎng)站,小編將為大家輸出更多高質量的實用文章!
網(wǎng)站題目:cdh3u3hadoop0.20.2MultipleOutputs多輸出文件怎么實現(xiàn)
文章路徑:http://www.chinadenli.net/article32/ggiepc.html
成都網(wǎng)站建設公司_創(chuàng)新互聯(lián),為您提供移動網(wǎng)站建設、自適應網(wǎng)站、軟件開發(fā)、面包屑導航、網(wǎng)頁設計公司、用戶體驗
聲明:本網(wǎng)站發(fā)布的內容(圖片、視頻和文字)以用戶投稿、用戶轉載內容為主,如果涉及侵權請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內容未經(jīng)允許不得轉載,或轉載時需注明來源: 創(chuàng)新互聯(lián)